## Abstract

Through retrofitting the descriptor of a scale-invariant feature transform (SIFT) and developing a new similarity measure function based on trajectories generated from Lissajous curves, a new remote sensing image registration approach is constructed, which is more robust and accurate than prior approaches. In complex cases where the correct rate of feature matching is below 20%, the retrofitted SIFT descriptor improves the correct rate to nearly 100%. Mostly, the similarity measure function makes it possible to quantitatively analyze the temporary change of the same geographic position.

© 2010 OSA

## 1. Introduction

Image registration is a crucial part in some graphical analysis tasks. It is the process of overlaying two or more images of the same scene taken at different times, from different viewpoints, and/or by different sensors, namely reference images and template images. Its aim is to find a suitable transformation that enables the transformed template image to be similar to the reference one. This generally consists of four steps: feature detection, feature matching, mapping function design, and image resampling and transformation. In remote sensing image registration, the performance of the registration algorithms always has two challenges.

For the first challenge, due to the different physical characteristics of various sensors and/or the photos taken within different wavelengths, the intensities of corresponding pixels are often intricate. Thus, the difficulty we face is that the multiple values of the pixels’ intensities in one image may correspond to a single value in another. The features of one photograph might partially appear in the other one or even disappear completely. Thus, it is necessary to develop a similarity measure to enhance the robustness and improve the accuracy of image registration. Currently, most of the utilized similarity measures are a series of area-based methods, such as intensity-based [1–6], frequency-based [7–10] and other types [11,12]. Among the area-based methods, the mutual information is regarded as an efficient similarity measure in the multi-modal image registration [13]. The pre-condition of area-based methods, however, is that the windows of the primary image should be similar to those of the reference one. Actually, in the remote sensing images for the same geographic position, there always exists rotation, scale, affine, perspective transformations, etc., causing the windows’ mismatch, which makes these kinds of methods invalid.

The second challenge is the geometric distortions stemming from transformations. A feasible way to deal with this is to extract features from the reference and template images separately. Following that, a descriptor is selected to search for a suitable relation between the two sets of features. If the applied descriptor is invariant, independent and stable, it guarantees the registration of an image reliably. The scale-invariant feature transform (SIFT) [14] is one of the reliable descriptors. Within the transformations previously mentioned, affine is frequently used to approximate a local transformation when the depth of an object is much smaller compared to the viewing distance in the remote sensing image process. Yet, the SIFT descriptors are sensitive to the affine transformation. Another deficiency of SIFT is that the SIFT is also developed from the pixels’ intensity values of the images. These two deficiencies cause the correct rate of feature matching to be quite low, even below 20%, when the SIFT algorithm is applied in some difficult cases.

In this paper, we propose a image registration approach based on retrofitted SIFT algorithm and trajectories generated from Lissajous curves. On the basis of this method, we demonstrate that the accuracy of the similarity measure is greatly improved.

## 2. Method

#### 2.1 Retrofitted SIFT

SIFT is used to detect and extract local features in images. But, it is sensitive to the affine transformation and other intensity changes caused by noise, varying illumination and different sensors. This deficiency can be remedied by the advantage of geomorphology, which remains stable in the remote sensing images. Among salient structure features extracted from the images in feature-based methods, such as point, line, edge, region, etc [15–17], the point feature turns out to be unreliable if there are significant photometric and deformational changes caused by the transformation. Compared with the point feature, the contour-based methods are more superior, not only because they remain stable (or partly stable) under significant changes caused by the multi-modality of images, but also because they have the transformation information [18–21]. So we optimize the SIFT algorithm in the following way.

We first obtain a collection of matched feature point pairs by SIFT and rank them by their similarities to achieve a coarse transformation. Since in this step the correct rate of feature matching is low, methods such as the least median of squares (LMS) [22] and random sample and consensus (RANSAC) [23] algorithms become inefficient to estimate the parameters of the selected transformation model. We thus propose to utilize a triangle-area representation (TAR) [24], which is relatively invariant to affine transformation, to select and generate the sensible matching point pairs.

The TAR value is computed from the signed area of a triangle formed by three points, say,${p}_{b}$, ${p}_{m}$ and${p}_{e}$. The corresponding signed area is defined as follows:

*k*, respectively. If the relation between the reference and template images is given by

For a pair of matched feature points $fr$ and $fs$ with marked by colored crosses in Figs. 1(a) , 1(b), we denote points on the curve $\mathrm{Er}$as $\mathrm{Er}=\left\{{p}_{r}^{i}:i=1,2,\cdots ,n\right\}$ and points on the curve $\mathrm{Es}$ as $\mathrm{Es}=\left\{{p}_{s}^{j}:j=1,2,\cdots ,m\right\}$. The TAR representation of the edges can be calculated by means of the following equations:

If the local transformation around $fr$and $fs$can be approximated as affine, then the transformation between $\mathrm{ImR}$ and $\mathrm{ImS}$can be approximated as a scale and shift operation. We thus employ the SIFT algorithm to detect and match feature points from $\mathrm{ImR}$ and $\mathrm{ImS}$. According to Eq. (4), a point $(i,j)$ in $\mathrm{ImR}$ corresponds to a triangle formed by $fr$, ${p}_{r}^{i}$ and ${p}_{r}^{j}$ in the reference image. It is the same with $\mathrm{ImS}$. The pair of feature points detected from $\mathrm{ImR}$ and $\mathrm{ImS}$ corresponds to one pair of triangles as shown in Figs. 1(c) ,1(d). Since the affine transformation is determined by three points which are not collinear, the local transformation around $fr$ and $fs$ can be estimated with the aid of nearby edges.

With the aid of heuristic information of the local transformation, the whole process of SIFT is recalculated. Firstly, the feature points are re-detected. Secondly, the main direction of each feature becomes clear, and the descriptor is recalculated. Thirdly, in the process of feature matching, a heuristic search is employed. Lastly, the location of the matched feature points is adjusted with the similarity measure proposed in this paper. Then, the matched feature points are used again to estimate the parameters of transformation. Also, the feature matching can be recalculated with SIFT and the transformation then estimated. After several iterations, we can obtain a set of satisfyingly matched feature points.

#### 2.2 Similarity measure

To enhance the robustness, accuracy and reliability of a registration algorithm, we propose a novel similarity measure based on the *mutual information and Lissajous figure* (MILF). The Lissajous figure (or Lissajous curve) is a graph of the system of parametric equations

Given a set values of ${A}_{x},{A}_{y},{\omega}_{x},{\omega}_{y},{\varphi}_{x},{\varphi}_{y}$, a trajectory (denoted by $TR$) can be generated according to Eq. (5). If *λ* denotes the mutual information, the similarity (based on the Lissajous figure) between two points $pR$and $pS$is defined as

If $T{R}_{1}$ and $T{R}_{2}$ are two trajectories around $pR$generated according to Eq. (5), then ${G}_{1}^{R}=\left\{p{r}_{1}^{i}:i=1,2,\cdots ,n\right\}$ and ${G}_{2}^{R}=\left\{p{r}_{2}^{i}:i=1,2,\cdots ,n\right\}$are two sets of gray values of two sets of points selected from $T{R}_{1}$ and$T{R}_{2}$, respectively. Equivalently, ${G}_{1}^{S}=\left\{p{s}_{1}^{i}:i=1,2,\cdots ,n\right\}$ and ${G}_{2}^{S}=\left\{p{s}_{2}^{i}:i=1,2,\cdots ,n\right\}$ are two set of gray values of two sets of points selected from trajectories $T{S}_{1}^{f}$and $T{S}_{2}^{f}$ around$pS$, where $T{S}_{1}^{f}$ and $T{S}_{2}^{f}$ are the corresponding trajectories of $T{R}_{1}^{r}$ and $T{R}_{2}^{r}$ through *f* (the transformation between the reference and template images), where$p({r}_{1},{r}_{2})$is the joint distribution of ${G}_{1}^{R},{G}_{2}^{R}$; $p({s}_{1},{s}_{2})$ is the joint distribution of ${G}_{1}^{S},{G}_{2}^{S}$, and $p({r}_{1},{r}_{2},{s}_{1},{s}_{2})$ is the joint distribution of${G}_{1}^{R},{G}_{2}^{R},{G}_{1}^{S},{G}_{2}^{S}$. Based on these, $\mathrm{MILF}(pR,pS)$ can be calculated according to Eq. (7) as follows:

A Lissajous figure generates different trajectories used to calculate the similarity measure. If two trajectories are adopted, with the mutual information among four discrete random variables, MILF not only captures the statistical dependency, but also characterizes the spatial interrelationships of the gray tones. In this way, much more information is considered, leading to improved robustness and accuracy.

## 3. Experiments and discussion

Three sets of remote sensing images are selected to conduct this experiment, as shown in Figs. 3 and 4 . The details of the three sets of test images are tabulated in Table. I. Figures 3(c), 3(d) provides only the partial images selected from Figs. 3(a), 3(b). In order to evaluate the performance of similarity measure proposed in this paper, a pair of points ($pr$ and $ps$) are chosen from Figs. 3(c), 3(d), which are marked by the crosses. Both of them correspond to the same geographic position.

In order to compare the performance of the optimized SIFT algorithm and SIFT algorithm, we extract 3440 feature points from the reference image and 3515 feature points from the template image. After matching with the optimized SIFT and SIFT algorithms separately, these matched feature points are ranked by their similarities. The correct matching rate of the *N* top pairs of matched feature points is used to evaluate their performance. It is calculated according to $C(N)={N}_{r}/N$, where ${N}_{r}$ is the number of correct matched feature point pairs among the *N* top pairs. The experiment results are shown in Fig. 5
. From Fig. 5(a), we can see that the correct feature matching rate with the original SIFT is below 20%. One reason is that the photos shown in Figs. 3(a), 3(b) were taken at different wavelengths and times. Another reason is that the size and position of the image is also changed and distorted – even the color of the image is changed due to the different spectral features of the images. Following the method in Sec. 2.1, we retrofit the SIFT algorithm by combining the local contour information. The optimized SIFT descriptor not only considers the information from the pixel values of the images, but also extracts the topological information on geography, such as coasts, rivers, and so forth. So the retrofitted SIFT algorithm greatly improves the correct feature matching rate from below 20% to nearly 100%, as shown in Fig. 5(b).

Based on the similarity measure proposed in Sec. 2.2, the similarity value of every pixel corresponding to a point in the reference image can be easily obtained, which is generally used to represent the matching of geographic points. The highest similarity measure value between two points in the reference and template images indicates the same geographic position of the two points.

If a $40\times 40$ window in the template image around point $ps$ in Fig. 3(d) is randomly chosen, the similarity values describing the correlation between point $pr$ in Fig. 3(c) and the points in the window form a $40\times 40$ similarity matrix. In the beginning, the usual methods, such as zero-mean normal cross correlation (ZNCC) [25] and mutual information (MI) [26], are applied to find $pr$’s matched point. The results are disappointing as shown in Fig. 6 . If we go back to Fig. 3, it is found that these two methods cannot help $pr$ find its real matched point $ps$. As illustrated in Figs. 6(a), 6(c), the positions marked by green crosses are produced by the previous two methods. They all deviate from their real corresponding geographic positions marked by white crosses. Furthermore, the related distributions of their similarity measure function produced by ZNCC and MI, as shown in Figs. 6(b), 6(d), are so flat that the theoretically-obtained position deviates from the real geographic position.

We now build two trajectories from Lissajous curves with the following parameters: ${A}_{x}={A}_{y}=120\text{\hspace{0.17em}}\text{\hspace{0.17em}};\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\omega}_{x}=111\text{\hspace{0.17em}}\text{\hspace{0.17em}};\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\omega}_{y}=121$ and ${A}_{x}={A}_{y}=135\text{\hspace{0.17em}}\text{\hspace{0.17em}};\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\omega}_{x}=111\text{\hspace{0.17em}}\text{\hspace{0.17em}};\text{\hspace{0.17em}}\text{\hspace{0.17em}}{\omega}_{y}=131$. From Fig. 7(a) , it is seen that the newly-developed similarity measure function helps point $pr$ in the reference image to precisely match the same geographic position of point $ps$ in the template one. Different from the previous cases, the peak in Fig. 7(b) shows the highest similarity measure value, which is exactly the geographic position $ps$, marked by the green cross shown in Fig. 7(a). This makes it easier to match the same geographic positions in the different remote sensing images.

For the three sets of images shown in Figs. 3 and 4, after combining all the methods developed above, these two remote sensing images taken by different sensors at different wavelengths can be precisely registered, as shown in Fig. 8 . Here, we use an image mosaic to show the registration results intuitively. The mosaic of the reference image and the template image is created in the following way. Firstly, the reference image is divided into equal-sized rectangular sections. Then, every other section is filled with the corresponding part of the warped template image according transformation estimated. The correctness of the registration results can be verified visually by checking the continuity of the common edges and regions in the mosaic images.

In order to evaluate the accuracy and performance of the method proposed in this paper, we also compare it against three widely available image registration tools, and the mutual information of the reference and aligned template image is taken as the metric. In this experiment, the first and second sets of images are used. The three kinds of tools are TurboReg [27], image registration tool (IRT) [28] and ImReg [29]. The 2D projection transformation model that we have used is defined by Eq. (8) in homogeneous coordinates:

## 4. Conclusion

In this paper, we proposed an image registration approach, which is based on the retrofitted SIFT algorithm and a new similarity measure based on trajectories generated from Lissajous figures. By optimizing the SIFT algorithm, the accuracy of the feature matching in some difficult cases of remote sensing image registration is improved from below 20% to nearly 100%. With the help of the similarity measure proposed in this paper, the accuracy of the image registration algorithm is improved considerably. This makes it possible to carry out quantitative analyses and measurements of the change of the same geographic position at different times. The experiment shows that the method can register the remote sensing images with a satisfactory performance.

## Acknowledgments

This work was supported by the National Science Foundation of China under Grant 20804039 and the Zhejiang Provincial Natural Science Foundation of China under Grant Y4080300. The authors would like to thank Intermap Technologies Inc./USGS for the test data.

## References and links

**1. **A. Wade and F. Fitzke, “A fast, robust pattern recognition asystem for low light level image registration and its application to retinal imaging,” Opt. Express **3**(5), 190–197 (1998). [CrossRef] [PubMed]

**2. **H. Chen, M. K. Arora, and P. K. Varshney, “Mutual information-based image registration for remote sensing data,” Int. J. Remote Sens. **24**(18), 3701–3706 (2003). [CrossRef]

**3. **Z. Li, Z. Bao, H. Li, and G. Liao, “Image autocoregistration and InSAR interferogram estimation using joint subspace projection,” IEEE Trans. Geosci. Rem. Sens. **44**(2), 288–297 (2006). [CrossRef]

**4. **J. Orchard, “Efficient least squares multimodal registration with a globally exhaustive alignment search,” IEEE Trans. Image Process. **16**(10), 2526–2534 (2007). [CrossRef] [PubMed]

**5. **Z. Cao, Y. Zheng, Y. Wang, and R. Yan, “An algorithm for object function optimization in mutual information-based image registration,” Proceedings of the 2008 Congress on Image and Signal Processing, **4**, 426–430 (2008).

**6. **G. Shao, F. Yao, and M. Malkani, “Aerial image registration based on joint feature-spatial spaces, curve and template matching,” in IEEE International Conference on Information and Automation (Hunan, China, 2008), pages 863–868.

**7. **A. Wong and J. Orchard, “Efficient FFT-accelerated approach to invariant optical-LIDAR registration,” IEEE Trans. Geosci. Rem. Sens. **46**3917–3925 (2008). [CrossRef]

**8. **I. Zavorin and J. Le Moigne, “Use of multiresolution wavelet feature pyramids for automatic registration of multisensor imagery,” IEEE Trans. Image Process. **14**(6), 770–782 (2005). [CrossRef] [PubMed]

**9. **J. G. Liu and H. Yan, “Phase correlation pixel-to-pixel image co-registration based on optical flow and median shift propagation,” Int. J. Remote Sens. **29**(20), 5943–5956 (2008). [CrossRef]

**10. **G. Hong and Y. Zhang, “Wavelet-based image registration technique for high-resolution remote sensing images,” Comput. Geosci. **34**(12), 1708–1720 (2008). [CrossRef]

**11. **S. Guyot, M. Anastasiadou, E. Deléchelle, and A. De Martino, “Registration scheme suitable to Mueller matrix imaging for biomedical applications,” Opt. Express **15**(12), 7393–7400 (2007). [CrossRef] [PubMed]

**12. **B. Zitova and J. Flusser, “Image registration methods: a survey,” Image Vis. Comput. **21**(11), 977–1000 (2003). [CrossRef]

**13. **A. Rajwade, A. Banerjee, and A. Rangarajan, “Probability density estimation using isocontours and isosurfaces: applications to information-theoretic image registration,” IEEE Trans. Pattern Anal. Mach. Intell. **31**(3), 475–491 (2009). [CrossRef] [PubMed]

**14. **D. Lowe, “Distinctive image features from scale-invariant keypoints,” Int. J. Comput. Vis. **60**(2), 91–110 (2004). [CrossRef]

**15. **K. Mikolajczyk and C. Schmid, “Scale and affine invariant interest point detectors,” Int. J. Comput. Vis. **60**(1), 63–86 (2004). [CrossRef]

**16. **T. Tuytelaars and L. Van Gool, “Matching widely separated views based on affinely invariant neighborhoods,” Int. J. Comput. Vis. **59**(1), 61–85 (2004). [CrossRef]

**17. **F. P. Nava, and A. P. Nava, “A probabilistic generative model for unsupervised invariant change detection in remote sensing images,” in IEEE International Geoscience and Remote Sensing Symposium (Barcelona, 2007), pages 2362–2365.

**18. **S. Jiao, C. Wu, R. W. Knighton, G. Gregori, and C. A. Puliafito, “Registration of high-density cross sectional images to the fundus image in spectral-domain ophthalmic optical coherence tomography,” Opt. Express **14**(8), 3368–3376 (2006). [CrossRef] [PubMed]

**19. **F. Eugenio, F. Marques, and J. Marcello, A contour-based approach to automatic and accurate registration of multitemporal and multisensor satellite imagery,” in IEEE International Geoscience and Remote Sensing Symposium, Volume **6** (Toronto, 2002), pages 3390–3392.

**20. **N. Netanyahu, J. Le Moigne, and J. Masek, “Georegistration of Landsat data via robust matching of multiresolution features,” IEEE Trans. Geosci. Rem. Sens. **42**(7), 1586–1600 (2004). [CrossRef]

**21. **A. Wong and D. A. Clausi, “ARRSI: Automatic registration of remote-sensing images,” IEEE Trans. Geosci. Rem. Sens. **45**(5), 1483–1493 (2007). [CrossRef]

**22. **C. Stewart, “Robust parameter estimation in computer vision,” SIAM Rev. **41**(3), 513–537 (1999). [CrossRef]

**23. **M. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography,” Commun. ACM **24**(6), 381–395 (1981). [CrossRef]

**24. **N. Alajlan, I. El Rube, M. S. Kamel, and G. Freeman, “Shape retrieval using triangle-area representation and dynamic space warping,” Pattern Recognit. **40**(7), 1911–1920 (2007). [CrossRef]

**25. **N. Kyriakoulis, A. Gasteratos, and S. G. Mouroutsos, “Fuzzy vergence control for an active binocular vision system,” in 7th International Conference on Cybernetic Intelligent Systems (London, 2008), pages 1–5.

**26. **A. Rajwade, A. Banerjee, and A. Rangarajan, “Probability density estimation using isocontours and isosurfaces: applications to information-theoretic image registration,” IEEE Trans. Pattern Anal. Mach. Intell. **31**(3), 475–491 (2009). [CrossRef] [PubMed]

**27. **http://bigwww.epfl.ch/thevenaz/turboreg/.