Polynomial transformation model for frame-to-frame registration in an adaptive optics confocal scanning laser ophthalmoscope

Hao Chen; Hao Chen; Yi He; Yi He; Yi He; Yi He; Ling Wei; Ling Wei; Jinsheng Yang; Xiqi Li; Guohua Shi; Guohua Shi; Guohua Shi; Yudong Zhang

doi:10.1364/BOE.10.004589

1. Introduction

Adaptive optics (AO) technology was first integrated into a confocal scanning laser ophthalmoscope (SLO) in 2002, for the purpose of improving image resolution and acquiring near diffraction-limited retinal images [1]. In recent years, multiple studies have been devoted to developing high-performance AOSLO devices [2–7]. However, due to safety requirements and interference from various sources, the signal-to-noise ratio (SNR) in AOSLO images is very low. In addition, this modality suffers from the effects of fixational eye movements, which include various components from low to relatively high frequencies (∼100 Hz) [8,9]. As such, objects that appear to be fixed in a field of view form images that sweep across dozens of photoreceptors within milliseconds. Since AOSLO systems typically use small fields of view (∼0.75-1.5 degrees), eye movements can cause obvious distortions. Therefore, multi-frame averaging methods are typically utilized to improve image quality and remove such distortions, making accurate image registration an indispensable component of this modality. The primary objective of image registration (both feature- and intensity-based methods) is to find an optimal transformation that produces the best alignment between input images, particularly for the structures of interest (e.g., cone and rod photoreceptors). Selecting a suitable geometric transformation model is crucial to the success of a registration algorithm and is highly dependent on the nature of the image to be registered.

Several transformation models have been reported for AOSLO retina image registration in recent years. In general, these techniques can be divided into rigid and non-rigid classes. Rigid transformations are the simplest and can be defined in 2D space using 4 parameters or degrees of freedom (DoF): 2 translational and 2 rotational criteria. Such transformations are typically utilized in multi-frame averaging applications where consecutive images are rapidly acquired from a region of interest. Dubra et al. demonstrated that mapping functions between images can be described by linear translations if the scanning in ophthalmic instruments is sufficiently fast with respect to the speed of involuntary eye movements [10]. However, in AOSLO imaging, real mapping relationships between images are much more complex due to gaps between high-frequency fixational eye movements (0-150 Hz) and the relatively low-frequency imaging rate of AOSLO systems (25-60Hz). Some researchers have proposed operating on image patches instead of the whole image in order to satisfy this approximation. Vogel et al. showed that the horizontal scan rate is several orders of magnitude faster than the vertical scan rate [11]. This motion can be approximated by drift during short time intervals if each image frame is partitioned into several patches that extend horizontally across the frame and are some fraction (typically 1/8 or 1/16) of the vertical frame height. As such, rigid transformations are often combined with patch-based cross-correlation approaches for estimating retina motion [10–14]. The primary advantage of patch-based registration methods is an ability to remove inter- and intra-frame distortions. However, this technique suffers from low accuracy due to the poor representation abilities of rigid transformations (especially for frames with large distortions) [12]. The non-rigid transformation class includes similarity (translation, rotation, and uniform scaling), affine (translation, rotation, scaling, and shearing), and curved transformations that are the primary resource for frame-to-frame registration of AOSLO retina images, due to better representation abilities. For example, Li et al. [15] and Chen et al. [16] used second-order polynomial transformations to align AOSLO images. Non-rigid transformations have also been combined with patch-based methods. For example, Faisan et al. [12] registered lines of AOSLO retina images using multi-scale B-spline representations of a 1D deformation field. While this method is highly accurate, it is also quite time-consuming. Recently, affine transformations have been utilized to accomplish multi-modal automatic montaging of AOSLO retina images [17,18]. Frame-to-frame non-rigid registration is advantageous because of its simplicity, as most techniques include only three steps: (a) identifying matched point pairs, (b) calculating transformation model parameters, and (c) obtaining registered images using the calculated transformation matrix [15,19]. However, with the exception of the multi-scale B-spline non-rigid registration method proposed by Faisan et al, such techniques struggle with intra-frame distortions caused by disequilibrium between the high complexity of eye motion and the limited representation abilities of transformation models. This process is shown in Fig. 1.

Fig. 1. Retinal images acquired in a healthy 30-year-old male. From left to right: (a) the reference frame, (b) the distorted frame, (c) and (d) the registration image by using patch-based method, and the frame-to-frame method based on second-order polynomial transformation model, respectively. Zoomed-in regions in blue boxes show the distorted areas which cannot be registered by these methods, while the zoomed-in regions in red boxes present the newly introduced distortions due to the poor registration. The scale bar is 50 µm.

Download Full Size | PDF

The primary objective of this study is to develop a suitable transformation model for frame-to-frame AOSLO image registration. Following the study of retina motion by Mulligan [20], we propose the use of high-order polynomial transformation models to describe the mapping relationship between AOSLO frames. With the higher-order polynomial transformation model established, more general retinal motions such as scale changes, shearing and rotation motions and so on could be taken into consideration. Presented experimental results demonstrate that such transformations are beneficial for improving registration accuracy. However, algorithm complexity rapidly increased with polynomial order while improvements in accuracy were limited. In order to select the most suitable polynomial transformation model, we explored the relationship between the number of point pairs, the distribution uniformity of these point pairs, the polynomial order, and the registration performance. We further improved the corresponding feature detection method to achieve more accurate registration and proposed the concept of maximum valid area to exclude unsuccessfully registered regions.

The remainder of this paper is organized as follows. The retina motion model is introduced in Section II. A high-order polynomial transformation model is then constructed and a close-form solution is provided. Section III validates the efficiency of this high-order polynomial transformation model using real AOSLO retina images. In addition, we also studied the relationship between the number of point pairs, the distribution uniformity of these point pairs, the polynomial order, and the registration performance. In Section IV, we conclude the paper with closing observations and a discussion of future research.

2. Methodology

2.1 Identifying matched point pairs

Before introducing the transformation models applied to AOSLO image registration, we first introduce the method used to find matched point pairs. This is a critical step, as the accurate calculation of model parameters requires a sufficient quantity of matched pairs. These points were acquired using the accelerated KAZE (AKAZE) [21] feature detection technique and the RANSAC [22] algorithm. As demonstrated by Chen et al. [16], AKAZE features are based on nonlinear scale space and achieve better performance than traditional scale invariant features (SIFTs) [23] or sped-up robust features (SURFs) [24]. As such, we utilized the AKAZE algorithm to extract approximately 1,000 feature points from each image. Then the nearest neighbor distance ratio matching criteria [23] with a distance ratio value equaling 0.8 was utilized to determine the tentative matched feature points. Finally, the correctly matched point pairs were identified by the RANSAC method. RANSAC method has been demonstrated to work well for selecting matched point pairs and removing outliers [16,18]. Considering that there are nonlinear deformations in AOSLO images, we set the match threshold parameter of RANSAC method to 10 (the match threshold is a parameter to discriminate the inliers and outliers, if the error of the candidate matching is less than the match threshold, then the matched point is considered as an inlier and vice versa).

2.2 Review of conventional transformation models

Two of the most widely-used image registration methods in AOSLO image registration are the frame-to-frame method (which is based on the second-order polynomial transformation model [15]) and the strip-based method (which is based on cross-correlation and linear transformation [14,25]). As seen in Fig. 1, these models struggle with large distortions. This is evident in the central area of the registered images represented by the blue boxes in Fig. 1, showing compressed photoreceptor cells and vessels. New distortions were introduced along the bottom of the aligned images (e.g., the red boxes in Fig. 1), where photoreceptor cells are heavily compressed due to poor registration. These results indicate that current transformation models are not sufficient for describing the practical differences between two frames. As such, a more suitable model is required for AOSLO retina image registration.

2.3 Retina motion model

The proposed retina motion model was developed from the work of Mulligan [20], who represented retina motion using second-order polynomials. This model can be defined as:

(1)$${p_x}(t )= {x_0} + {\dot{p}_x}(0 )t + \frac{1}{2}{\ddot{p}_x}(0 ){t^2},$$

(2)$${p_y}(t )= {y_0} + {\dot{p}_y}(0 )t + \frac{1}{2}{\ddot{p}_y}(0 ){t^2},$$

where ${\dot{p}_x}(0 )$ and ${\dot{p}_y}(0 )$ are the velocities of the target point in the horizontal and vertical directions, (x₀, y₀) are the coordinates of the target point at time t = 0, ${\ddot{p}_x}(0 )$ and ${\ddot{p}_y}(0 )$ are the initial acceleration terms, t is the current time, and (p_x(t), p_y(t)) are the corresponding points in the image. The time at which a point trajectory intersects the raster scan is represented by:

(3)$${t_p} = \frac{{{y_p}}}{{{V_{S,\;y}}}} + \frac{{{x_p}}}{{{V_{S,\;x}}}},$$

where (x_p, y_p) is the position of the image point at time t_p, V_S_,x and V_S_,y are the scan velocities.

In this model, only the velocities and accelerations of the target were considered. However, in practice, retina motion is far more complex. Therefore, in this paper, the target position was expressed using a high-order Taylor series:

(4)$${p_x}(t )= {x_0} + {\dot{p}_x}(0 )t + \frac{1}{2}{\ddot{p}_x}(0 ){t^2} + \ldots + \frac{1}{{n!}}{\mathop p\limits^n}_{x}(0 ){t^n},$$

(5)$${p_y}(t )= {y_0} + {\dot{p}_y}(0 )t + \frac{1}{2}{\ddot{p}_y}(0 ){t^2} + \ldots + \frac{1}{{n!}}{\mathop p\limits^n}_{y}(0 ){t^n}.$$

Here, n is the order of the polynomial model and ${\mathop p\limits^n}_{x}(0 )$ and ${\mathop p\limits^n}_{y}(0 )$ denote the n^th-order derivatives at time t = 0. Substituting (3) into (4) and (5) produces the conventional n^th-order polynomial model at time t_p:

(6)$${x_p} = {x_0} + \sum\limits_{j = 1}^n {{K_j}({{x_p},{y_p}} )} ,$$

(7)$${y_p} = {y_0} + \sum\limits_{j = 1}^n {{M_j}({x,\;y} )} ,$$

where

(8)$${K_j}({x,\;y} )= - \sum\limits_{u \ge 0,\;v \ge 0 \atop u + v = j} {{k_{u,\;v}}{x^u}{y^v}} ,$$

and

(9)$${M_j}({x,\;y} )= - \sum\limits_{u \ge 0,\;v \ge 0 \atop u + v = j} {{m_{u,\;v}}{x^u}{y^v}} .$$

Here, k and m are the coefficient vectors for the polynomial model.

2.4 Polynomial transformation model and close-form solution

The initial coordinates of a target point in the reference frame are given by (x₀, y₀), then the corresponding position in the query frame is given by (x₀-k_0,0, y₀-m_0,0), where k_0,0 and m_0,0 denote two translation variables. The transformation between two frames can then be approximately described by:

(10)$${x_r} = - \sum\limits_{j = 1}^n {{K_j}({{x_q},{y_q}} )} + {k_{0,0}},$$

(11)$${y_r} = - \sum\limits_{j = 1}^n {{M_j}({x,\;y} )} + {m_{0,0}},$$

where (x_r, y_r) is a point in the reference frame and (x_q, y_q) is a corresponding point in the query image. This transformation becomes a linear translation for n = 0. Eqs. (10) and (11) can also be expressed using matrix-vector multiplication:

(12)$$\begin{aligned} \left( \begin{array}{l} {x_r}\\ {y_r} \end{array} \right) &= \left( {\begin{array}{ccccccc} {{k_{0,0}}}&{{k_{1,0}}}&{{k_{0,1}}}&{{k_{1,1}}}&{\ldots }&{{k_{n,0}}}&{{k_{0,\;n}}}\\ {{m_{0,0}}}&{{m_{1,0}}}&{{m_{0,1}}}&{{m_{1,1}}}&{\ldots }&{{m_{n,0}}}&{{m_{0.n}}} \end{array}} \right). \nonumber \\ & \quad {\left( {\begin{array}{ccccccc} 1&{{x_q}}&{{y_q}}&{{x_q}{y_q}}&{\ldots }&{{x_q}^n}&{y_q^n} \end{array}} \right)^T} \end{aligned}$$

The polynomial transformation parameters can be determined from a set of matched points as follows. Assuming w pairs of points were identified in the reference and query images, a matrix R can be defined that contains the coordinates of the w matched points in the reference image:

(13)$$R = \left[ {\begin{array}{cccc} {{x_{r,1}}}&{{x_{r,2}}}&{\ldots }&{{x_{r,\;w}}}\\ {{y_{r,1}}}&{{y_{r,2}}}&{\ldots }&{{y_{r,\;w}}} \end{array}} \right].$$

The matrix Q contains the corresponding points in the query image:

(14)$$Q = {\left[ {\begin{array}{ccccccc} 1&{{x_{q,1}}}&{{y_{q,1}}}&{{x_{q,1}}{y_{q,1}}}&{\ldots }&{x_{q,1}^n}&{y_{q,1}^n}\\ 1&{{x_{q,2}}}&{{y_{q,2}}}&{{x_{q,2}}{y_{q,2}}}&{\ldots }&{x_{q,2}^n}&{y_{q,2}^n}\\ {\ldots }&{\ldots }&{\ldots }&{\ldots }&{\ldots }&{\ldots }&{\ldots }\\ 1&{{x_{q,\;k}}}&{{y_{q,\;k}}}&{{x_{q,\;k}}{y_{q,\;k}}}&{\ldots }&{x_{q,\;k}^n}&{y_{q,\;k}^n} \end{array}} \right]^T}.$$

The parameter matrix in (12) is denoted by P and a closed-form solution can be acquired from:

(15)$$P = R{Q^T}{({Q{Q^T}} )^{ - 1}}.$$

The number of degrees of freedom (DoF) for the n^th-order polynomial transformation model can also be formulated as:

(16)$$DoF = \frac{{({n + 2} )({n + 1} )}}{2}.$$

Actually, the number of DoF determines the number of deformations which can be corrected by the current transformation model, and the n^th-order polynomial transformation parameters can only be determined if the number of matched point pairs is larger than the number of DoF.

3. Experimental validation

3.1 Data acquisition

In this study, datasets were acquired from five healthy subjects. Prior the imaging, human eyes were dilated with phenylephrine hydrochloride (2.5%) and tropicamide (1%) to a diameter of 6-8 mm. Subjects were asked to fixate (using the imaged eye) at a target as steadily as possible while the AOSLO videos were acquired (when the imaging rate is 30 Hz). All light exposures adhered to the maximum permissible exposure limits set by the American National Standards Institute standards [26]. Two datasets were captured from each subject, and each dataset had 100 frames or much more. The frame size is 512×449 pixels, the intensity of the frame is between [0, 255] and the field of view (FOV) is 1.5° on the human retina (approximately 445 µm assuming a focal length of 17 mm for the human eye).

3.2 Evaluation metrics

In practical settings, the retina is always in motion. While the relatively high imaging rate of AOSLO (30+ Hz) makes it possible to align retina images, the relationship between any two frames is difficult to determine due to eye movements. Furthermore, as noted by Chen et al. [16], true transformations cannot be acquired using calibration-based techniques. This prevents the direct evaluation of alignment accuracy for different transformation models. As such, in this study, we evaluate model efficiency and performance. Since we have no knowledge of the true transformation, a similarity index or correlation score was used in overlapping regions of the reference and query images.

Two common assessment metrics were utilized in this study, including normalized cross-correlation (NCC), and the structural similarity index (SSIM). Since our AOSLO image frames are acquired consecutively, they are similar in both intensity and texture. This facilitated the use of intensity-based correlation metrics NCC [16–18]. The NCC value ranges from −1 to 1, where 1 represents one perfect alignment. The SSIM is based on the human visual system and provides good image quality prediction for a wide variety of image distortions [27,28]. The SSIM value ranges from 0 to 1, and the maximum SSIM value of 1 is achieved only if the two images are identical. Finally, the quality of averaged images was measured using the contrast method introduced by Chen et al. [16], in which the larger the image contrast, the better the image quality.

3.3 Experimental results

This section compares the registration performance of polynomial transformation models of varying orders. A traditional patch-based registration method was adopted as the anchor and the first frame was used as a reference. NCC, and SSIM values were set to 0 once the number of matched point pairs was smaller than the corresponding DoF.

Assessment metric values for a single subject are presented in Fig. 2 and the statistical results of all datasets are included in Table 1 and 2. It is evident from Fig. 2(a) and (d) that the traditional patch-based registration method performed better than the frame-to-frame registration method with low-order (e.g. 1-th and 2-th order) polynomial transformation models adopted. But the proposed polynomial transformation models produced higher NCC and SSIM values in almost every frame once the polynomial order increased to 3-th or higher orders. This suggests an ability to describe real retina motion that is superior to the patch-based registration method. Furthermore, Fig. 2 demonstrates that although higher-order (i.e., 5^th and 6^th) polynomial models achieved the best performance for query frames close to the reference image (the middle column of Fig. 2), their performance gradually decreased and eventually became worse than low-order models when the query frame departed from the reference frame (the last column of Fig. 2). This is because higher-order polynomial models have higher DoF and more point pairs are required to calculate an accurate transformation matrix (which was carefully studied in the following part of this article, and the necessary amount of point pairs for each polynomial transformation model was determined clearly). As the frame number increases, differences between the reference and query frames become larger and fewer matched point pairs are available. This prevents the acquisition of sufficiently accurate transformation parameters using these point pairs.

Fig. 2. The NCC and SSIM values of one dataset. From top to bottom: (a-c) the NCC results, (d-f) the SSIM results. The sub-lines in blue and pink boxes at the first column are enlarged on the middle and last columns, respectively.

Download Full Size | PDF

Table 1. The Mean Values (Mean) and Standard Deviations (Stds) of Normalized Cross-correlation (NCC) Values of Ten Datasets. The Largest Mean and The Smallest Std in Each Column Are Highlighted in Bold.

View Table | View all tables in this article

Table 2. The Mean Values (Mean) and Standard Deviations (Stds) of Structural Similarity Index (SSIM) Values of Ten Datasets. The Largest Mean and The Smallest Std in Each Column Are Highlighted in Bold.

View Table | View all tables in this article

Especially, we can easily observe that there are obvious peaks in Fig. 2. In practice, these peaks can be divided into two types. The first kind of peaks, whose values are much larger than 0, are caused by micro-saccades, such as the peaks at 11^th and 21^th frames. Observing the zoomed-in regions suggests that higher-order polynomial transformation models are useful for registering such frames. The other type (frames 42-44 and 96-100) is caused by the subject blinking. This produces empty frames where the AKAZE feature detection method cannot find enough matched point pairs (more than DoF).

The results in Table 1 and 2 are consistent with these findings. Especially, the standard deviations (Stds) of NCC and SSIM values obtained with different polynomial transformation models in all datasets follow a similar distribution in most cases. It is relatively large for first-order polynomials and decreases significantly with increasing order, increasing again for high orders when the order exceeds 3 or 4. The average standard deviation for all datasets is shown in Fig. 3, and it is obvious that the distribution of Std values is convex and the minimum Std value is obtained at medium-order cases (3^th or 4^th order). One possible explanation for this trend is that the registration performance is restricted in low-order cases by poor representation abilities. As a result, some query frames cannot be well registered, which increases the standard deviation. Conversely, in high-order cases, the lack of sufficient point pairs becomes a primary restriction that decreases the stability of registration performance. In mid-order cases (3^th or 4^th order), the polynomial transformation model and the number of matched point pairs are mutually satisfied and the registration performance is relatively steady.

Fig. 3. The averaged standard deviations of NCC and SSIM metrics in all datasets for Table 1 and 2.

Download Full Size | PDF

These results indicate that registration performance is determined by both the number of point pairs and the order of the polynomial transformation model. In this section, we study the relationship between the order of a polynomial transformation model and the number of matched point pairs by estimating registration performance. We first extracted 100 frames, consisting of 10 frames from each dataset, acquired evenly for every tenth frame. We then calculated the NCC and SSIM results in these frames for varying numbers of matched point pairs. The number of points was modified by firstly extracting as many feature points from each image as possible. We then selected the total number of matched point pairs using the method introduced in Section 2. Afterwards, these point pairs were sorted according to the distance of matched feature descriptors. Finally, certain quantities of matched points were extracted from the sorted pairs and polynomial transformation models of different orders were utilized to register the images.

Averaged results are presented in Fig. 4, where it is evident that the limitation of registration performance is determined by the transformation model. In theory, higher-order polynomial transformation models should achieve better registration performance. However, the actual result is heavily influenced by the number of matched point pairs. It is clear from Fig. 4 that although every transformation model benefited from more point pairs, higher-order polynomials were more sensitive to this quantity (i.e., performance decreased rapidly if the number of pairs was too small). In addition, fewer points were needed to achieve steady performance using lower-order polynomials and increasing this quantity further did not noticeably improve performance. Figure 4 also provides information for choosing a suitable transformation model when the number of matched point pairs is known, and high-order (≥3) polynomial transformation model is proposed to be adopted, which indeed has better ability to achieve accurate registration than the patch-based method does. The results shown in Table 3 were acquired using the AKAZE feature detection method, and is shown to make some guidelines for selection of the specific polynomial transformation model order according to the number of matched point pairs.

Fig. 4. The averaged registration performances obtained by utilizing different number of point pairs and different transformation models. (a) NCC results and (b) SSIM results.

Download Full Size | PDF

Table 3. Guidelines to Choose the Best Suitable Transformation Model According to the Number of Detected Point Pairs.

View Table | View all tables in this article

Considering the requirement of real-time managements, and the fact that we usually cannot find more than 600 matched point pairs for every frame. We decided to use the fourth-order polynomial transformation model to accomplish the frame-to-frame registration. On an Intel Core i5-2430 CPU 2.4 GHz laptop computer with the Opencv 3.4.1 library, we took about 47 ms to align an image by using this proposed model. The following experimental results were all obtained with this model.

This experiment analyzed the influence of matched point pair quantities on registration accuracy. However, we found that the distribution of these points also plays an important role in successful registration, especially for higher-order polynomials. Figure 5 presents one unsuccessful case in which it is evident that even for normal frames, without large intra-frame distortions, the registration is not accurate in all areas of the query frame, e.g. the compressed cone on the bottom-left corner of Fig. 5(e). It is because that there are almost no matched point pairs in the bottom-left corner of the two frames, as presented in Fig. 5(c). In practice, finding matched point pairs in image corners is critical for calculating accurate transformation parameters, due to the existence of higher-order terms in the model. The bottom-left corner of Fig. 5(d) indicates that patch-based method cannot well align the bottom half of the query image, and this is caused by small nonlinear deformations existed in this area. In this study, two methods were proposed to reduce the influence of uneven point pair distributions.

Fig. 5. An unsuccessful registration case due to the non-uniformity distribution of matched point pairs. The retina images were acquired from a 38-year-old healthy female. (a) The reference frame, (b) the query frame, (c) the result of matched point pairs obtained with original AKAZE method, (d) the registered image by using the patch-based method, (e) image registered by proposed method with the matched point pairs in (c) used, (f) matched point pairs by using the improved AKAZE method and (g) image registered by proposed method with the matched point pairs in (f) used. The red dot denotes a tracked cone and the green box is added to evaluate the relative position of the tracked cone. The scale bar is 50 µm.

Download Full Size | PDF

One of these was an optimized feature detection method. In the original AKAZE algorithm, the border for feature point detection is defined as:

(17)$${B^i} = 10\sqrt 2 {\sigma _i},$$

where ${\sigma _i}$ represents the i^th blurring scale and ${B^i}$ denotes a border used for detecting feature points at that scale. For example, a minimum blurring scale of 1.6 in the AKAZE algorithm indicates that at least 22 rows or columns near the image edges were not considered in the process of feature point detection. In order to identify additional feature points in areas near the edges of an image, we propose reducing the algorithm border using:

(18)$${B^i} = \frac{{10\sqrt 2 }}{c}{\sigma _i},$$

where c is a positive constant. In our experiment, we found that setting c equal to 2 produced good results. Larger values did not improve performance or lead to algorithm failure. As seen in Fig. 5(f), this optimized AKAZE technique produced more matched point pairs near the edges and corners of each image, which resulted in more accurate registration results (e.g., the bottom-left corner region in Fig. 5(g)).

While this improved feature detection method is beneficial for finding additional matched point pairs between frames, it does not work well in all cases. Specifically, it encounters difficulties for frames with fewer features (photoreceptor cells in the edge and corner areas) or large distortions. In these cases, we propose the use of maximum valid area (MVA) to exclude inaccurate registration regions, and experimental results have been shown in Fig. 6. Feature descriptors are constructed by collecting information in small areas near feature points (i.e., a square with sides equal to 20 in the AKAZE method). Therefore, MVA could be determined by considering matched points near the reference image edges and corresponding feature descriptor regions. As seen in Fig. 6(g), MVA was constructed by first identifying four matched points near the corners. The valid region was then determined based on the area required for feature descriptor construction, indicated by the blue lines in Fig. 6(h). As presented in Fig. 6(i), it is obvious that the registration process was successful by using the proposed transformation model and the MVA strategy. However, the patch-based method cannot well register such frames in Fig. 6(c), because the large distortion existed in these images. The black regions near the edges of these images were outside the query image, which indicated a severity of eye motions to disturb the image registrations. To distinguish the registration performance clearly, we also selected a cone photoreceptor and marked it as a red dot with a green box containing it in the reference image, and has been shown in Fig. 6. The real valid region has been marked by a human rater, and is shown in red box in Fig. 6(h). Compared to the calculated MVA labelled in blue box, it is apparent that most of the successfully registered area (about 95%) can be well segmented by this proposed method. The segmented image in Fig. 6(i) has higher metric values than Fig. 6(d), which further indicated the efficiency of the proposed MVA strategy.

Fig. 6. An example validates the effectiveness of the proposed maximum valid area method. The retina images were acquired from a 28-year-old male. (a) The reference image, (b) the query frame, (c) the registered image by using the patch-based method, (d) the registered image by the proposed method, (e) matched point pairs by using the improved AKAZE method, (f) the distribution of matched keypoints in the reference image, (g) diagram of the maximum valid region, (h) the difference between the calculated maximum valid region (blue box) and the real valid region (red box, selected by human rater) and (i) the segmented valid image of (d). The red dot denotes a tracked cone and the green box is added to evaluate the relative position of the tracked cone. The scale bar is 50 µm.

Download Full Size | PDF

The experimental results shown in Fig. 5 and 6 demonstrate registration performance for normal retina frames (without large intra-frame distortions). However, several images suffered from significant micro-saccade motion in almost every dataset. As seen in Fig. 1, traditional patch-based method struggle with such distortions. In contrast to normal frames, the matched point pairs in these distorted images are typically sparser and exhibit non-uniform distributions. This is mostly caused by micro-saccade motion, as seen in Fig. 7(b), where a large distortion in the query frame leads to as large as 53 pixels and 31 pixels translations in horizontal and vertical directions, respectively. However, as shown in Fig. 7(d), the registration process was still successful due to a use of the proposed transformation model and MVA strategies, which can be validated by observing the relative positions of marked cones and green boxes.

Fig. 7. Image registration with micro-saccade motions. (a) The reference image;(b) the distorted image;(c-d) the registered images by using patch-based and the proposed methods, respectively; (e) the comparison of the maximum valid region (blue box) and the real valid region (red box); (f) the segmented area by the MVA strategy; (g-h) the averaged residual values of lines in horizontal and vertical directions, respectively. The scale bar is 50 µm.

Download Full Size | PDF

This efficiency is further demonstrated by the averaged relative error in both the horizontal and vertical directions, as shown in Fig. 7(g) and (h) (only pixels in the maximum valid region were considered). It is evident that the proposed model produced fewer errors in distorted areas than conventional models (e.g., lines 70-250 in the horizontal direction). Specifically, it seems that the proposed method performed relatively worse at horizontal lines 250 to 300 (similarly at vertical lines of 140 to 160), it is because that in this area, there are less photoreceptors and the calculated errors are inaccurate and even converse to the fact. To give a better comparison, we extracted these areas from the images and presented them in Fig. 8, it is obvious that the proposed method achieved more accurate registration and had better visual quality.

Fig. 8. The lines (from 250 to 300) extracted from the images in Fig. 7. From top to bottom, the reference image, the query image, the image registered by patch-based and the image registered by the proposed method, respectively.

Download Full Size | PDF

As seen in Fig. 4, fourth-order transformations provided a balanced trade-off between registration performance and speed, without requiring large quantities of point pairs. As such, a fourth-order model was applied to superpose all frames in the dataset. The results are shown in Fig. 9 and Table 4, where it is evident that averaged images exhibit higher contrast and superior registration performance. As shown in the zoomed-in areas, the cones in Fig. 9(c) are clearer and brighter than those in Fig. 9(b), which indicated that the proposed method is more efficient than patch-based method. The horizontal intersecting lines presented in Fig. 9(d) indicated that the averaged image obtained by the proposed method has higher contrast in most cases, resulting from a better remove of noise.

Fig. 9. Comparison of averaged images obtained with different methods. (a) The reference image, (b) the averaged images obtained by using the patch-based method, (c) averaged image obtained by the proposed method, (d) the horizontal intersecting lines in the same position of (b-c). The scale bar is 50 µm.

Download Full Size | PDF

Table 4. The Image Contrasts of Averaged Images of Different Datasets. The Largest One in Each Column Is Highlighted in Bold.

View Table | View all tables in this article

In addition, this fourth-order polynomial transformation model was used to estimate retina motion. Corresponding transformation parameters were calculated using Eq. (12), the mean (Mean) and standard deviation (Std) of which are shown in Table 5. Since transformation parameters are signed, the mean is smaller and the standard deviation is larger. Differences between subjects also increased the standard deviation and it is obvious that Stds of k_0,0 and m_0,0 are the largest values, suggesting translation to be the primary component of retina motion. However, the presence of other non-zero parameters indicates complex components in real retina motion. The Std and Mean values of k_1,0 are smaller than corresponding values for m_0,1, which is indicative of additional distortions introduced by vertical scanning. This is consistent with the acquisition pattern as the speed of horizontal scanning is several orders of magnitude faster than that of vertical scanning. As such, restoring parameters corresponding to high-order terms is critical for improving registration performance.

Table 5. The Mean (Mean) and Standard Deviation (Std) values of The Polynomial Transformation Parameters. The Largest Value in Each Column Is Highlighted in Bold.

View Table | View all tables in this article

4. Discussion and conclusion

4.1 Discussion

The experimental results presented above suggest that real retina motion is highly complex but can be represented accurately using the proposed high-order polynomial model. In contrast to conventional techniques, this method considers general retina motion such as scaling, shearing, and rotation changes in high-resolution images. In addition, our results suggest transformation models to be the primary limitation of registration performance. Higher-order polynomials generally offer better performance when provided with sufficient point pairs. The distribution of matched points also had an effect on this process as uniform distributions produced better results. This study demonstrated that the AKAZE feature detection method typically cannot extract enough point pairs in the corners of an image. As such, we improved the algorithm by reducing the border used for feature point detection. We also proposed the concept of maximum valid area to exclude inaccurate registration regions. The efficiency of this process was verified experimentally.

The proposed model was also used to register images distorted by micro-saccade motion. Results demonstrated the robustness of this technique, as it was able to process large distortions that are often problematic for conventional algorithms. Experimental results also demonstrated that the proposed high-order transformation model could improve image contrast.

Compared with traditional patch-based method, the largest advantage of the proposed method is that it can well deal with frames with large intra-frame distortion (i.e. reshape the compressed cones). Moreover, the proposed method can automatically skip the black frames due to subject blinking. On the other hand, there are two main drawbacks of the proposed method. Firstly, it is heavily relied on the feature detection method. The proposed method can obtain good performance if and only if there are large amount of accurately matched point pairs. The other disadvantage of the proposed method is that the real-time performance of the current version of the proposed method is not satisfying, and would continue to optimize in the future work.

4.2 Conclusion and future research

In this paper, we first established the high-order retina motion model by using the Taylor series and polynomial expansion. Then we generated the polynomial transformation model and provided its close-form solution for consecutively frame-to-frame AOSLO retina image registration. The experimental results demonstrated that higher-order polynomial transformation models are helpful to achieve more accurate registration. However, with the increasing of the order, the algorithm complexity is rapidly increasing whereas the accuracy performance promotion is limited. In order to select the most suitable polynomial transformation model, we explored the relationship between the number of point pairs, the distribution uniformity of these point pairs, the polynomial order, and the registration performance. We further improved the corresponding feature detection method to achieve more accurate registration and propose the concept of maximum valid area to exclude unsuccessfully registered regions.

Several future efforts can be taken to extend this work. Firstly, the performance of high-order polynomial transformation models can be further improved by using multi-modal images simultaneously. It is obvious that more matched point pairs can be found in multi-modal images [17,18], which allows one to use higher-order polynomial transformation model and deal with images which have more complex distortions. Secondly, improving the performance of the feature detection method is also significant to promote the registration accuracy. Combining the patch-based registration and the polynomial transformation model together seems to be a good idea to achieve more accurate alignment. Thirdly, a faster version of the proposed method can be obtained by using the GPU technique, in order to realize a real-time processing. Moreover, exploiting the relationship between the imaging frame rate of AOSLO system and the order of polynomial is significant to choose suitable transformation model. Last but not least, with the development of deep learning technique, training the convolutional neural networks (CNN) to automatically obtain the parameters of polynomial transformation models is a very creative, and viable solution to study [29–33].

Funding

Jiangsu Province Science Fund for Distinguished Young Scholars (BK20160010); National Natural Science Foundation of China (61605210); National Basic Research Program of China (973 Program) (2016YFC0102500, 2017YFB0403700) National Instrumentation Program (2012YQ120080).

Acknowledgments

We thank LetPub (www.letpub.com) for its linguistic assistance during the preparation of this manuscript.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

References

1. A. Roorda, F. Romero-Borja, W. J. Donnelly III, H. Queener, T. J. Hebert, and M. C. Campbell, “Adaptive optics scanning laser ophthalmoscopy,” Opt. Express 10(9), 405–412 (2002). [CrossRef]

2. S. A. Burns, R. Tumbar, A. E. Elsner, D. Ferguson, and D. X. Hammer, “Large-field-of-view, modular, stabilized, adaptive-optics-based scanning laser ophthalmoscope,” J. Opt. Soc. Am. A 24(5), 1313–1326 (2007). [CrossRef]

3. R. D. Ferguson, Z. Zhong, D. X. Hammer, M. Mujat, A. H. Patel, C. Deng, W. Zou, and S. A. Burns, “Adaptive optics scanning laser ophthalmoscope with integrated wide-field retinal imaging and tracking,” J. Opt. Soc. Am. A 27(11), A265–A277 (2010). [CrossRef]

4. A. Dubra and Y. Sulai, “Reflective afocal broadband adaptive optics scanning ophthalmoscope,” Biomed. Opt. Express 2(6), 1757–1768 (2011). [CrossRef]

5. N. D. Shemonski, F. A. South, Y.-Z. Liu, S. G. Adie, P. S. Carney, and S. A. Boppart, “Computational high-resolution optical imaging of the living human retina,” Nat. Photonics 9(7), 440–443 (2015). [CrossRef]

6. J. Lu, B. Gu, X. Wang, and Y. Zhang, “High speed adaptive optics ophthalmoscopy with an anamorphic point spread function,” Opt. Express 26(11), 14356–14374 (2018). [CrossRef]

7. S. Mozaffari, V. Jaedicke, F. Larocca, P. Tiruveedhula, and A. Roorda, “Versatile multi-detector scheme for adaptive optics scanning laser ophthalmoscopy,” Biomed. Opt. Express 9(11), 5477–5488 (2018). [CrossRef]

8. S. Martinez-Conde, S. L. Macknik, and D. H. Hubel, “The role of fixational eye movements in visual perception,” Nat. Rev. Neurosci. 5(3), 229–240 (2004). [CrossRef]

9. S. Martinez-Conde, “Fixational eye movements in normal and pathological vision,” Prog. Brain. Res. 154(Part A), 151–176 (2006). [CrossRef]

10. C. R. Vogel, D. W. Arathorn, A. Roorda, and A. Parker, “Retinal motion estimation in adaptive optics scanning laser ophthalmoscopy,” Opt. Express 14(2), 487–497 (2006). [CrossRef]

11. A. Dubra and Z. Harvey, “Registration of 2D images from fast scanning ophthalmic instruments,” in International Workshop on Biomedical Image Registration, (Springer, 2010), pp. 60–71.

12. S. Faisan, D. Lara, and C. Paterson, “Scanning ophthalmoscope retinal image registration using one-dimensional deformation fields,” Opt. Express 19(5), 4157–4169 (2011). [CrossRef]

13. C. K. Sheehy, Q. Yang, D. W. Arathorn, P. Tiruveedhula, J. F. de Boer, and A. Roorda, “High-speed, image-based eye tracking with a scanning laser ophthalmoscope,” Biomed. Opt. Express 3(10), 2611–2622 (2012). [CrossRef]

14. Q. Yang, J. Zhang, K. Nozato, K. Saito, D. R. Williams, A. Roorda, and E. A. Rossi, “Closed-loop optical stabilization and digital image registration in adaptive optics scanning light ophthalmoscopy,” Biomed. Opt. Express 5(9), 3174–3191 (2014). [CrossRef]

15. H. Li, J. Lu, G. Shi, and Y. Zhang, “Tracking features in retinal images of adaptive optics confocal scanning laser ophthalmoscope using KLT-SIFT algorithm,” Biomed. Opt. Express 1(1), 31–40 (2010). [CrossRef]

16. H. Chen, Y. He, L. Wei, X. Li, and Y. Zhang, “Automatic Dewarping of Retina Images in Adaptive Optics Confocal Scanning Laser Ophthalmoscope,” IEEE Access 7, 59585–59599 (2019). [CrossRef]

17. M. Chen, R. F. Cooper, G. K. Han, J. Gee, D. H. Brainard, and J. I. Morgan, “Multi-modal automatic montaging of adaptive optics retinal images,” Biomed. Opt. Express 7(12), 4899–4918 (2016). [CrossRef]

18. B. Davidson, A. Kalitzeos, J. Carroll, A. Dubra, S. Ourselin, M. Michaelides, and C. Bergeles, “Fast adaptive optics scanning light ophthalmoscope retinal montaging,” Biomed. Opt. Express 9(9), 4317–4328 (2018). [CrossRef]

19. V. Nourrit, J. M. Bueno, B. Vohnsen, and P. Artal, “Nonlinear registration for scanned retinal images: application to ocular polarimetry,” Appl. Opt. 47(29), 5341–5347 (2008). [CrossRef]

20. J. Mulligan, “Recovery of motion parameters from distortions in scanned images,” in NASA Conference Publication, (NASA, 1998), pp. 281–292.

21. P. F. Alcantarilla and T. Solutions, “Fast explicit diffusion for accelerated features in nonlinear scale spaces,” IEEE Trans. Pattern Anal. Mach. Intell 34(7), 1281–1298 (2011).

22. M. A. Fischler and R. C. Bolles, “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM 24(6), 381–395 (1981). [CrossRef]

23. D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vision 60(2), 91–110 (2004). [CrossRef]

24. H. Bay, A. Ess, T. Tuytelaars, and L. Van Gool, “Speeded-up robust features (SURF),” Comput. Vis. Image Und. 110(3), 346–359 (2008). [CrossRef]

25. D. W. Arathorn, Q. Yang, C. R. Vogel, Y. Zhang, P. Tiruveedhula, and A. Roorda, “Retinally stabilized cone-targeted stimulus delivery,” Opt. Express 15(21), 13731–13744 (2007). [CrossRef]

26. A. N. S. Institute, American national standard for safe use of lasers (Laser Institute of America, 2007).

27. Z. Wang and A. C. Bovik, “A universal image quality index,” IEEE Signal Process. Lett. 9(3), 81–84 (2002). [CrossRef]

28. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

29. G. Wu, M. Kim, Q. Wang, Y. Gao, S. Liao, and D. Shen, “Unsupervised deep feature learning for deformable registration of MR brain images,” in International Conference on Medical Image Computing and Computer-Assisted Intervention, (Springer, 2013), pp. 649–656.

30. S. Miao, Z. J. Wang, and R. Liao, “A CNN regression approach for real-time 2D/3D registration,” IEEE Trans. Med. Imaging 35(5), 1352–1363 (2016). [CrossRef]

31. X. Yang, R. Kwitt, M. Styner, and M. Niethammer, “Quicksilver: Fast predictive image registration–a deep learning approach,” NeuroImage 158, 378–396 (2017). [CrossRef]

32. Z. Yang, T. Dan, and Y. Yang, “Multi-Temporal Remote Sensing Image Registration Using Deep Convolutional Features,” IEEE Access 6, 38544–38555 (2018). [CrossRef]

33. F. Ye, Y. Su, H. Xiao, X. Zhao, and W. Min, “Remote sensing image registration using convolutional neural network features,” IEEE Geosci. Remote Sensing Lett. 15(2), 232–236 (2018). [CrossRef]

Mean ± Std (%)	Data1	Data2	Data3	Data4	Data5	Data6	Data7	Data8	Data9	Data10	Ave
Patch-based	74.81 ± 6.01	66.42 ± 9.05	66.34 ± 13.01	69.17 ± 9.09	66.77 ± 8.55	80.64 ± 6.52	64.58 ± 8.90	70.64 ± 6.90	77.28 ± 6.47	70.38 ± 8.06	70.70 ± 8.26
First-order	72.45 ± 7.33	63.91 ± 8.76	64.37 ± 12.42	64.76 ± 8.93	61.72 ± 11.70	79.06 ± 6.12	59.96 ± 10.67	68.09 ± 9.31	75.62 ± 7.00	67.52 ± 11.10	67.75 ± 9.33
Second-order	75.57 ± 6.59	66.00 ± 7.90	66.30 ± 11.51	68.58 ± 8.30	67.75 ± 8.75	80.33 ± 5.19	64.52 ± 7.51	70.33 ± 6.49	77.25 ± 5.08	69.41 ± 9.08	70.60 ± 7.64
Third-order	76.57 ± *5.48*	68.34 ± *6.60*	68.64 ± 9.66	73.53 ± 7.28	70.39 ± 6.71	82.57 ± 3.35	68.00 ± 5.04	72.16 ± 4.93	77.93 ± 4.49	70.90 ± 6.68	72.90 ± 6.02
Fourth-order	77.38 ± 5.73	68.96 ± 6.96	69.78 ± *8.99*	74.38 ± *6.73*	73.76 ± *6.12*	82.78 ± 3.14	72.36 ± *4.84*	72.96 ± *4.50*	79.51 ± *4.10*	72.02 ± *6.49*	74.39 ± *5.76*
Fifth-order	76.82 ± 6.33	68.97 ± 8.05	69.54 ± 9.98	74.00 ± 8.33	74.74 ± 6.43	82.65 ± *2.97*	72.76 ± 5.14	72.65 ± 5.28	79.27 ± 4.54	71.90 ± 6.61	74.33 ± 6.37
Sixth-order	75.43 ± 7.18	67.56 ± 9.28	68.80 ± 10.78	73.29 ± 9.88	74.22 ± 6.66	82.41 ± 3.26	72.24 ± 5.99	71.53 ± 5.68	78.47 ± 5.33	71.15 ± 7.25	73.51 ± 7.13

Mean ± Std (%)	Data1	Data2	Data3	Data4	Data5	Data6	Data7	Data8	Data9	Data10	Ave
Patch-based	39.28 ± 6.82	41.19 ± 7.44	41.72 ± 10.21	42.19 ± 7.96	42.63 ± 7.49	45.05 ± 7.63	37.91 ± 7.87	41.70 ± 6.06	41.79 ± 7.04	45.13 ± 7.06	41.86 ± 7.56
First-order	37.00 ± 6.98	38.83 ± 7.17	40.03 ± 9.15	37.80 ± 7.40	38.05 ± 9.93	42.68 ± 6.41	34.08 ± 8.38	39.47 ± 7.66	40.01 ± 7.54	42.76 ± 9.13	39.07 ± 7.97
Second-order	40.29 ± 6.92	40.70 ± 6.61	41.68 ± 8.80	41.50 ± 7.46	44.16 ± 7.81	44.29 ± 5.81	37.94 ± 6.72	41.63 ± 5.87	41.78 ± 6.57	44.35 ± 8.13	41.83 ± 7.07
Third-order	41.34 ± *6.48*	42.62 ± *6.05*	43.74 ± 8.14	46.76 ± *6.98*	46.72 ± 6.31	47.15 ± 4.81	40.57 ± *5.73*	43.32 ± 5.16	42.62 ± *6.47*	45.77 ± 6.75	44.06 ± *6.29*
Fourth-order	42.27 ± 6.80	43.21 ± 6.70	44.78 ± *8.07*	47.53 ± 7.05	50.09 ± *6.08*	47.43 ± 4.65	44.46 ± 6.27	44.12 ± *4.93*	44.06 ± 6.54	46.92 ± *6.59*	45.49 ± 6.37
Fifth-order	41.86 ± 7.23	43.58 ± 7.79	44.74 ± 8.62	47.17 ± 8.19	51.03 ± 6.43	47.35 ± *4.57*	44.59 ± 6.64	43.81 ± 5.25	44.02 ± 6.92	46.74 ± 6.74	45.49 ± 6.84
Sixth-order	40.78 ± 7.85	42.62 ± 8.78	44.27 ± 9.14	46.65 ± 8.71	50.70 ± 6.69	47.12 ± 4.89	44.03 ± 7.20	42.78 ± 5.57	43.30 ± 7.38	46.15 ± 7.16	44.84 ± 7.34

Number of point pairs	Best transformation model
≤ 50	First-order polynomial model
≤ 75	Second-order polynomial model
≤ 125	Third-order polynomial model
≤ 550	Fourth-order polynomial model
≥ 550	Fifth-order polynomial model

Image Contrast (%)	Data1	Data2	Data3	Data4	Data5	Data6	Data7	Data8	Data9	Data10	Ave
Patch-based	37.86	27.20	27.62	29.08	28.08	41.77	31.16	32.89	43.40	31.04	33.01
Proposed method	38.34	27.94	28.39	30.00	28.74	42.23	32.68	33.11	43.89	31.72	33.70

k	Mean	Std	m	Mean	Std
k _0,0	−3.81	14.20	m _0,0	−5.59	18.15
k _1,0	0.998	0.0316	m _1,0	−0.0073	0.0454
k _0,1	0.0212	0.140	m _0,1	1.028	0.141
k _1,1	−1.25E-5	2.02E-4	m _1,1	2.78E-5	3.64E-4
k _2,0	8.66E-6	1.72E-4	m _2,0	4.90E-5	1.58E-4
k _0,2	−2.12E-4	1.06E-3	m _0,2	−1.16E-4	1.22E-6
k _2,1	2.21E-8	1.54E-6	m _2,1	−3.15E-8	1.22E-6
k _1,2	2.48E-8	8.07E-7	m _1,2	−7.90E-8	1.22E-6
k _3,0	−1.96E-8	5.89E-7	m _3,0	−7.90E-8	9.57E-7
k _0,3	6.99E-7	3.84E-6	m _0,3	−1.13E-7	4.38E-7
k _2,2	5.24E-11	1.53E-9	m _2,2	2.29E-7	4.29E-6
k _3,1	−6.07E-11	5.72E-10	m _3,1	−7.37E-11	1.42E-9
k _1,3	5.07E-12	3.36E-9	m _1,3	1.46E-10	6.66E-10
k _4,0	1.50E-11	6.30E-10	m _4,0	−2.48E-11	4.92E-9
k _0,4	−1.10E-9	1.38E-8	m _0,4	−7.52E-11	8.03E-9

Polynomial transformation model for frame-to-frame registration in an adaptive optics confocal scanning laser ophthalmoscope

Abstract

1. Introduction

2. Methodology

2.1 Identifying matched point pairs

2.2 Review of conventional transformation models

2.3 Retina motion model

2.4 Polynomial transformation model and close-form solution

3. Experimental validation

3.1 Data acquisition

3.2 Evaluation metrics

3.3 Experimental results

4. Discussion and conclusion

4.1 Discussion

4.2 Conclusion and future research

Funding

Acknowledgments

Disclosures

References

Cited By

Figures (9)

Tables (5)

Equations (18)

Biomedical Optics Express