## Abstract

Abstract: The automatic detection of aerosol- and cloud-layer boundary (base and top) is important in atmospheric lidar data processing, because the boundary information is not only useful for environment and climate studies, but can also be used as input for further data processing. Previous methods have demonstrated limitations in defining the base and top, window-size setting, and have neglected the in-layer attenuation. To overcome these limitations, we present a new layer detection scheme for up-looking lidars based on linear segmentation with a reasonable threshold setting, boundary selecting, and false positive removing strategies. Preliminary results from both real and simulated data show that this algorithm cannot only detect the layer-base as accurate as the simple multi-scale method, but can also detect the layer-top more accurately than that of the simple multi-scale method. Our algorithm can be directly applied to uncalibrated data without requiring any additional measurements or window size selections.

© 2013 Optical Society of America

## 1. Introduction

Detecting and quantifying the lifetime and spatial variations of aerosol particles and clouds can be very helpful in characterizing the physical properties as well as their impact on climate change [1]. Many probes have been used for determining the dynamic range and variability of aerosol particles and clouds, such as the sun-photometer and nephelometer, but lidar is a particular device that can obtain the vertical profile and scan in three dimensions thereby illustrating the complete distribution of information in the atmosphere [2]. Currently, lidar networks (e.g. the European Aerosol Research Lidar Network, and the Micro Pulse Lidar Network) and satellite-based lidars (e.g. the Cloud-Aerosol Lidar with Orthogonal Polarization, and the Geoscience Laser Altimeter System) have been widely used and studied [3–5]. Since lidar data processing is generally considered unconstrained- rather than a simple routine procedure [2], the development of practical algorithms remains a difficult task.

Layer detection is essentially based on differences in the physical and optical properties of aerosol- and cloud-layers that are reflected in a raw signal, *P*(*r*), or its derivative signals such as the range-corrected signal, *X*(*r*) [i.e., *P*(*r*)*r*^{2}], the atmospheric extinction coefficient or attenuated backscatter coefficients and so on [6]. The reflectance of the aerosol- and cloud-layers is generally represented as a significantly sharp increase relative to the value of clear air in the lidar signal. Generally, in the layer detection, the issues are detecting the layer base and top height, which are considered to be the beginning and ending of the sharp increase caused by a layer in the raw or derived backscatter signal, respectively [7–10]. At certain dynamic signal ranges, however, noise can cause a similar deviations which gives the false impression of a signal increase [11]. Thus, the key task of the layer detection is to distinguish genuine layer from noise in order to accurately determine the boundaries. Although many algorithms have been proposed, automatic layer detection is challenging, as discussed in detail in the following paragraphs.

Image processing techniques such as the wavelet method have been employed to determine the layer base, peak, and top, respectively [12]. CALIOP algorithm documents have analyzed in detail the disadvantages of those techniques and concluded that image processing methods should be rejected [4]. The wavelet method can correct for signal noise because it is essentially a low-pass filter. However, the local modulus maxims correspond to the “fastest-changing” bins, which are not exactly the base and top of the layer. This is due to changes near the layer base and top that are usually inconspicuous and “slower” than that of the other part of the layer.

An extinction based detection method has been proposed to deduce a layer boundary from retrieved extinction profiles [13]. However, an accurate retrieval should be done after obtaining information about the layer boundary because the lidar ratio for clear air, cloud, and aerosol is usually different. Furthermore, a clear region with high SNR (Signal-to-Noise Ratio) is needed for setting the initial boundary value in the retrieval, that may not be possible when there is an optically thick layer at lower altitude for the ground-based lidar.

Many methods based on a multi-point window have been proposed. For example, Pal et al. (1992) selected a layer base based upon the slope of the raw signal, which was obtained by a least-squares fitting with a multi-point window [8]. A layer base is selected when a certain number of consecutive slopes are larger than zero. However, the slope is very sensitive to noise and strongly oscillates about zero. Many other similar algorithms have been proposed [9, 14, 15], but were found to need an initial window size setting that in turn decreased the vertical resolution and caused errors.

The simple multi-scale method proposed by Mao et al. (2011) can avoid the selections of the window size. This method can successfully find both strong and weak layer signals and accurately position the layer base [16]. However, like most previous algorithms, this Mao et al. (2011) method neglects in-layer attenuation, and selects a layer top as the first range bin above the layer base with *X*(*r*) ≤ *X*(*r*_{b}) [8, 15]. This selection approach tends to under-estimate the layer top.

Threshold algorithms have been proposed based on attenuated scattering ratios, such as the CALIPSO profile scanner and similar algorithms for ground-based lidar data [4, 9, 17, 18]. These algorithms scan the vertical profile starting with “clear air returns”, and layer boundaries are located where the signal intensity exceeds the threshold array. However, a ground-based lidar signal may be fully attenuated to noise level before reaching a region of clear air when there is an optically thick, low altitude layer. Otherwise, the threshold algorithms would refine the layer top based on the slope, which is sensitive to noise.

Segmenting a lidar signal into piecewise sub-signals can be a key effective solution for lidar data processing. Gong et al. (2011) was the first to use an automatic linear segment technique to represent the *X*(*r*), which representation are subsequently used in cloud detection [19]. However, the method is still problematic (e.g., the threshold setting) and yields dissatisfactory results when the SNR is low.

In order to overcome the limitations in defining the base and top, subjectively setting window-size, or coarsely neglecting the in-layer attenuation in the previous methods, we present a new linear segment scheme for layer detection. The scheme is based on a more reasonable threshold array and can directly be used for *X*(*r*). False positive rejection criteria are used based on *X*(*r*) with varying thresholds. We first search for layer base and top based on a linear fitting of the segmentations. In order to enhance the robustness, we employ an “extrapolate into the layer” strategy. This will refine our estimate of layer base and top as both simulated and real data with various noise levels are designed as a stress test for the detection scheme. We find good consistency between layer-base heights retrieved by both the linear segment algorithm and the simple multi-scale method. However, the tops detected by the linear segment algorithm are much accurate than that detected by the simple multi-scale method. Our algorithm can be directly applied to uncalibrated data as no additional measurements or models are required and no window size needs to be selected.

## 2. Principles and methods

The backscatter power *P*(*r*) received by the detector can
be written as the following [2]:

*r*is the range,

*C*is lidar constant,

*G*(

*r*) represents the overlap factor [20],

*β*(

_{m}*r*) and

*β*(

_{p}*r*) are the molecule and particle backscatter coefficient respectively,

*α*(

_{m}*r*) and

*α*(

_{p}*r*) are the molecule and particle extinction coefficient respectively, and the noise

*e*(

*r*) is considered as Gaussian. Equation (1) illustrates that the lidar signal will increase considerably when a lidar laser pulse is incident into an optically thick layer. A diagram of the layer detection scheme based on

*X*(

*r*) (Fig. 1) shows that the scheme comprises four key steps that can be described as:

- (1) Signal segmentation: The segmentation is based on
*X*(*r*) with a threshold array of 6σ·*r*^{2}where σ is the standard deviation of the background noise, and can be estimated using the signal at a distant range where noise is prevalent thus no lidar return. The break-bins outputted by the segmentation can be candidates of base, peak and top of layers. - (2) Primary layer base and top selection: We search for the primary layer base when the fitted slope is positive, and select the primary top when fitted slope is negative and larger than or comparable to that below the primary base.
- (3) False positive rejection: When considering the envelope of the noise
*e*(*r*)*r*^{2}is ± 3σ·*r*^{2}, all detections whose signal difference is less than 3σ·*r*_{p}^{2}+ 3σ·*r*_{b}^{2}are considered as a false positive and subsequently removed,*r*_{p}and*r*_{b}are the range of the peak and base, respectively. - (4) Refine the layer base and top: To enhance the robustness, we employ an “extrapolate into the layer” strategy to refine the primary layer base and top based on the linear fitting of the range-corrected signal of the two nearest “clear air” segmentations.

#### 2.1 Signal segmentation

The segmentation idea has been widely used in various research fields, such as cartography [21] and data mining [22]. This is because segmenting a signal (curve) into a piecewise sub-signal (sub-curve) can be a key to further processing and application. Furthermore, a good segment algorithm can give the closest approximation to a result of visual analysis. To our knowledge, the initial form of the automatic segment approach for lidar data processing was first introduced and used by Gong et al. [19].

The goal of the lidar signal segmentation algorithm is to segment the signal into sub-signals, in which any backscatter contributions are considered as originating from the same object. We start with a range-corrected signal with *n* range bins *X*(*r*_{1}), *X*(*r*_{2}), ... and *X*(*r _{n}*). One of the major challenges of the segmentation is being able to determine the threshold array, which shall be discussed later. If the threshold array,

*d*(

_{thr}*r*), is given, we can subsequently select a bin as a break-bin according to the intensity difference between the

*X*(

*r*) and

*X*(

_{seg}*r*). The

*X*(

_{seg}*r*) is derived by linear interpolation with the

*X*(

*r*) of the beginning and ending bins. The absolute intensity difference can be written as:

After finding the maximum difference at range *r _{m}*, the corresponding range bin will be selected as a break-bin if

*d*(

*r*) exceeds the threshold value

_{m}*d*(

_{thr}*r*). Then, recursive segmenting calculations will be carried out based on the two sub-signals segmented by this break-bin until the threshold is no longer exceeded.

_{m}Determining *d*_{thr}(*r*) is one of the biggest challenges
in the segmentation method. Because the SNR of *X*(*r*) decreases
with range rapidly, it is not advisable to segment based on a constant threshold array. We
previously defined the error array, *e*(*r*) as the difference
between the *X*(*r*) and the denoised signal
*X _{de}*(

*r*) in [19]. Then, we defined the threshold array of a segmentation as the sum of the mean plus three times the standard deviation of

*e*(

*r*). One disadvantage of this strategy is that the

*d*(

_{thr}*r*) is very sensitive to noise, and the signal will be over-segmented as shown in Fig. 2(a).Furthermore, the segment has to be performed on a denoised signal rather than an original one, but a signal can be distorted by denoising technology.

_{m}Fortunately, a more sophisticated threshold array can be theoretically determined. In essence, the *X*(*r*) can be described as:

*S*(

*r*) is the pure raw signal. It shows that the noise level of

*X*(

*r*) is equivalent to

*e*(

*r*)

*r*

^{2}. Theoretically, the corresponding envelope of the noise

*e*(

*r*)

*r*

^{2}is ± 3σ·

*r*

^{2}. Furthermore, it can be shown that the curves of + 3σ·

*r*

^{2}and −3σ·

*r*

^{2}envelop the detected noise in Fig. 2(a) and the signal (primarily noise) above 4 km in Fig. 2(b), respectively. Thus, a threshold array

*d*(

_{thr}*r*) can be computed as:

The segmentations based on the threshold array of 6σ·*r*^{2} with *X*(*r*) can be seen in Fig. 2(b). The segmentation not only retains “marked-change bins” as break-bins, but also neglects most of the bins in the “no-change range” such as all the range bins above 4 km. The segmentation can avoid over-segmentation, which is beneficial for refining the layer boundary and removing any over-detections.

Based on the above discussion we revise “*r*_{1}, *r*_{2}, ..., *r _{n}*” to “1:n” for convenience and outline the pseudo-code of the recursive algorithm as follows:

Function [break_bins] = linear_segment [X(1:n), σ] | ||

// obtain X(1:_{seg}n) by interpolating X(1) and _{seg}X(_{seg}n) linearly: | ||

X(1:_{seg}n) = linear_interpolate[X(1), X(n)]; | ||

// obtain the absolute difference between X and X._{seg} | ||

d(1:n) = |X(1:n) -X(1:_{seg}n) |; | ||

// find d and its corresponding range bin._{m} | ||

[d, _{m}d _bin] = max[_{m}d(1:n)]; | ||

// obtain threshold array | ||

d(1:_{thr}n) = 6·σ·r^{2}(1:n); | ||

If d > _{m}d (_{thr}d _bin) // segment the signal _{m}X(1:n) with given threshold. | ||

// recursively segment the left part if necessary. | ||

[break_bins1] = linear_segment [X(1: d_bin), σ];_{m} | ||

// recursively segment the right part if necessary. | ||

[break_bins2] = linear_segment [X(d_bin + 1:end), σ];_{m} | ||

// combine the break bins of the left and right parts. | ||

break_bins = [break_bins1(1:end-1), break_bins2(1:end)]; | ||

Else | ||

// the break bins are the first and end bins of X(1:n) if d < _{m}d(_{thr}d _bin)_{m} | ||

break_bins = [X(1), X(n)]; | ||

End if | ||

End Function |

Furthermore, we derive a fitted signal *X _{fit,k}*(

*r*) =

*a*+

_{fit,k}·r*b*, where

_{fit,k}*a*and

_{fit,k}*b*are derived from all the

_{fit,k}*X*(

*r*) in the

*k-*th sub-signal (i.e., the signal between

*k*and

*k*+ 1 break-bins) by the least-squares method. The fitted slope

*a*(

_{fit}*r*) will be used for the detection of the layer base and top and is further discussed in the next section.

#### 2.2 Selection primary layer base and top

Generally, the layer base height is considered to be the beginning of the sharp increase in the direct or derived backscatter signal [7–10]. Detection of the cloud base can be difficult at times due to water uptake effects, haze, or even precipitation below the cloud base [10, 19]. However, our definition of the layer base is consistent with the previous studies. In essence, we select a layer base whenever *a _{fit}*(

*r*) becomes larger than zero. A second issue of layer detection is selecting the layer top. We initially estimate the layer top as the height of the first bin where

*X*(

*r*) ≤

*X*(

*r*

_{b}) above the base in the same manner as other studies [8]. To remedy the effects of in-layer attenuation, we further estimate the layer top as where the fitting slope is negative as well as greater than or comparable to that of the nearest clear-sky signal at a lower altitude. For an un-penetrated layer, the signal reduces to the noise level rapidly, and we further estimate the layer top height where we consider no signal but only noise. In this situation, the layer top is underestimated, and it is called the “effective top”.

#### 2.3 False positive rejection

Because *P*(*r*) has an “inverse square law” (1/*r*^{2}) effect, *X*(*r*) is used for rejecting false positive. As mentioned above, the noise level of *X*(*r*) is equal to *e*(*r*)*r*^{2}, and is therefore not recommended to use a constant threshold in order to reject the false positive. Thus, the rejection criterion is based upon varying threshold in *X*(*r*). By defining *X _{s}*(

*r*) and

_{p}*X*(

_{s}*r*) as pure range-corrected signals of the peak and base, respectively, the signal difference between layer peak and base can be written as:

_{b}The corresponding envelope of ∆ is *X _{s}*(

*r*)−

_{p}*X*(

_{s}*r*) ± (3σ·

_{b}*r*

_{p}^{2}+ 3σ·

*r*

_{b}^{2}). Furthermore,

*X*(

*r*) decrease along with range for clear air. Thus, for a false positive which is due to noise effect, the ∆ of noise variations should be no more than

We apply ∆* _{thr}* as a threshold to separate the genuine layer from false positive which are due to noise effect. Based on the threshold, all detections where ∆ is less than ∆

*are considered as false positive and subsequently removed.*

_{thr}Figure 3 shows two cases that investigate the
detection efficiency of the linear segment algorithm which is mainly caused by the false
positive rejection strategy. Figure 3(a) shows a pure
range-corrected signal, *X _{s}*(

*r*), and we use

*X*(

_{up}*r*) and

*X*(

_{low}*r*) to denote the upper envelope [i.e.

*X*(

_{s}*r*) + 3σ·

*r*

^{2}] and lower envelope [i.e.

*X*(

_{s}*r*)−3σ·

*r*

^{2}] of a noised range-corrected signal, respectively. Theoretically, according the defining of ∆

*, the algorithm will be triggered to discriminate a layer when*

_{thr}*X*(

*r*) is larger than

*X*(

_{up}*r*). The algorithm is able to successfully detect the layer as in the case in Fig. 3(a), because the

*X*(

_{low}*r*) is larger than the

_{p}*X*(

_{up-extrap}*r*), which is the extrapolation of the

_{p}*X*(

_{up}*r*) of the clear air lower than 4 km. However, the algorithm may fail to detect the layer for the case in Fig. 3(b), because the

*X*(

_{low}*r*) is less than

_{p}*X*(

_{up-extrap}*r*). In this situation, the possibility of the layer to be detected is equal to the sum of the possibilities

_{p}*X*(

_{layer}*r*) of every range bin larger than

*X*(

_{up-extrap}*r*). Thus, the larger of the geometric thickness of the layer, the larger of the possibility of the layer to be detected. Because the pattern of a layer is variable in the real world, it is difficult to suggest a constant possibility to be a reference for any given detection. Under the condition of Fig. 3(a), once a layer signal with random noise is given, a layer base will be detected in the range of 4-4.25 km. However, a layer base can likely be detected anywhere in the range of 4-5 km for a signal under the condition of Fig. 3(a).

Considering, an penetrable optically thick layer or geometrically thick layer will be easier to detect than a thin layer. We will only discuss the requirement for detecting optical and geometric thin layers as in Fig. 3(b) in the following. Considering *G*(*r*) as a unit as well as assuming *C*·*β _{m}*(

*r*)·

_{b}*T*

^{2}(

*r*) is equal to

_{b}*C*·

*β*(

_{m}*r*)·

_{p}*T*

^{2}(

*r*), the difference of

_{p}*X*(

*r*) and

_{p}*X*(

*r*) can be written as:

_{b}*T*

^{2}(

*r*) is the two-way transmittance at range

*r*. The range of ∆ is

*C*·

*β*(

_{p}*r*

_{p})·

*T*

^{2}(

*r*) ± (3σ·

_{p}*r*

_{p}^{2}+ 3σ·

*r*

_{b}^{2}). Because we will judge a detection to be a layer when ∆ is larger than ∆

*, theoretically, we have no difficulty in detecting a layer where*

_{thr}*β*(

_{p}*r*) meet the following condition:

_{p}In the same manner, the accuracy of the detection of the layer base (top) depend on how fast is the *β _{p}*(

*r*) to be increase (decrease) from zero (from the value of the left part of Eq. (8)) to the value of the left part of Eq. (8) (to zero).

#### 2.4 Refining layer base and top

The aforementioned selected bases and tops are generally break-bins, which are extreme values and therefore sensitive to noise. Sometimes the layer base (top) can be underestimated (overestimated) respectively, such as the primary selected base illustrate in Fig. 4 with a simulated layer in 4-5 km. To overcome the effects of noise, we employ an “extrapolate into the layer” strategy to refine the base and top of a retained layer after false positive rejection. We extrapolate the fitted signals into the layer up to the layer peak based on the linear fitting of the range-corrected signal of the two nearest “clear air” segmentations just below and above the layer as shown in Fig. 4. We refine our estimate of the layer base (top) by examining from the peak to the lower (higher) region, until the range-corrected signal is less than the extrapolated signal. The base/top heights are then updated after several iterations to further enhance the robustness of this method. We add the signal between the former selected base/top and the new one into the new fitting, and refine the base/top until they match the former ones. The impact of the refining is illustrated by Fig. 4, which shows that if we only select a break-bin as the layer base that will contain a larger error. However, after refining, the accuracy of the determined layer base can be improved significantly.

## 3. Results and discussion

#### 3.1 Testing with simulated signals

An extensive simulation study was conducted to verify that algorithm performance. Four kinds of signals were simulated, which were combined a pure signal with four different zero-mean Gaussian noise arrays. The pure signal is generated based upon a standard atmosphere with a layer (i.e. Gaussian peak) with base of 4 km and top of 5 km. The optical depth and lidar ratio of the layer are 0.05 and 20 sr, respectively. The error standard deviations of the noise arrays in Figs. 5(b)-5(d) are 2, 3, and 4 times as that of Fig. 5(a), respectively.

Figure 5 shows a case detection of the linear segment algorithm and the standard error-bar of both methods calculated based on 100 repeated simulations and detections. The statistics show decent accuracy of the layer detection of both methods, but the base (top) is overestimated (underestimated). This is because the enhancement of the signal intensity near the layer base and top are very weak which is easily hidden by noise. Figures 5(a)-5(d) shows that if the SNR is lower, then both the random error and bias will be larger, however the linear segment algorithm will still work well with less bias in a low SNR regime. The linear segment scheme can accurately position both the base and top owing to the selecting and refining strategies. The top underestimation of the simple multi-scale method is more than that of the linear segment algorithm, because in-layer attenuation is neglected while the top is simply selected as the first range bin where *X*(*r*) ≤ *X*(*r*_{b}) above the layer base [8, 15].

#### 3.2 Testing with real signals

We present an actual case that will demonstrate the performance of the layer detection algorithm. The data was designed as a stress test for the boundary detection scheme, which was acquired by a 532-nm Mie lidar on the 25-26 December 2008 in Wuhan, China. First, we present one signal example of continuous observation for illustrating the performance of the linear segment algorithm. Note that signals in the range affected by the overlap are not processed in this study. The segments in Fig. 6(a) show that the lower the SNR, the more range bins will be contained in a given segment. Thus, we can consider that the segment is self-adapting, and tends to combine as many range bins as possible in one segment to provide a robust linear fitting, which is beneficial for refining the layer boundary and removing false positives. Figure 6(b) shows that the slope of a clear air region is close to zero. As illustrated in Fig. 6(d), there are four layers that have been detected by comparing the original signal and the extrapolated line as in Fig. 6(c). The boundaries of the layers coincide with the locations that would have been identified by a visual inspection. The algorithm failed to refine the top of the layer at 1 km. In this situation, we consider the primary selected top as the final one. Furthermore, both of the small and larger layers have been successfully retained, but all of the false positives were effectively rejected.

Figure 7 shows a continuous measurement
identifying high layer above 8 km, low layer below 4 km, and overlaying optically thin layers
below 2 km. This is a good test to exercise all aspects of the linear segment algorithm. In
this study, we consider a continuous layer as cloud layers when the mean ratio of
*X*(*r*_{p}) to
*X*(*r*_{b}) is larger than four, otherwise it is
considered as an aerosol layer based on previous studies [12, 16, 19]. Note that this classification criterion should be used with caution before
further investigation has been done [23]. Figure 7(a) shows the boundaries of the aerosol (gray curve)
and cloud (black curve) layers as reported by the simple multi-scale method while Fig. 7(b) pertains to the linear segment algorithm. Figure 7 demonstrates that both algorithms detect the layer
satisfactorily, and their results are nearly consistent with one another. Note that the result
of the simple multi-scale method is slightly different from our previous result [16] because we have updated the background-noise removal and
calibration methods in the new version of our lidar package.

Figure 8 shows a scatter-plot of the layer bases (with
“o” markers in blue) and tops (with “+” markers in red) retrieved
by the linear segment algorithm and the simple multi-scale method, respectively. We consider
the two estimates of the two methods belong to the same layer when they share similar peaks
within the same signal. A base (top) regression-line was solved from the two kind of bases
(tops) as detected by the two methods, respectively. We find that the base regression-line
matches the “1:1” line very well. However, the top regression-line is below the
“1:1” line which is mainly due to the fact that the linear segment method
considers the in-layer attenuation of an optically thick cloud at 3~4 km while the simple
multi-scale method does not. As shown in Fig. 8, the
tops below 2 km are in good agreement mainly because the aerosol layer is spatially and
optically thin such that any in-layer attenuation is negligible. However, the tops above 7 km
are in good agreement primarily because the noise level is very low and therefore hides the
effect of the in-layer attenuation. The simple multi-scale method will separate a multi-peak
layer into two layers when there are bins with *X*(*r*) ≤
*X*(*r*_{b}) between the peaks of the layer. This occurs
even when the bins are at the very least affected by pure Rayleigh scattering as opposed to
other stronger atmospheric effects (i.e., aerosols, hydrometeors, etc). This incorrect
identification is the source of the outliers as marked in Regions A and B in Fig. 8.

## 4. Summary

A linear segmentation algorithm has been proposed to analyze lidar backscatter data sets for detecting the base and top of significant particle layers in the atmosphere. This algorithm is based on four key steps, which are signal segmentation, primary layer base and top selection, false positive rejection, and layer base and top refining. We find no significant bias in the detected of the linear segment algorithm. Small backscatter spikes due to noise are fully ignored by both methods, but the algorithms still work well when the SNR is low. Identification of layer boundary is done based on linear segmentation, and accurately refines the estimate of layer boundary by extrapolating the linear fitting from the nearest “clear air” segmentations into the layer. We find good agreement between cloud-base heights retrieved by both methods. Furthermore, the top detection is underestimated by employing the simple multi-scale method because the definition of which neglects in-layer attenuation. However, the linear segment algorithm also can position the top satisfactorily owing to the accurate segmentation and the refining strategies.

In this study, only a few cases have been tested with encouraging results, therefore the potential of using the linear segment in various tests and applications of lidar data processing should be explored. Improving or even proposing new segment algorithms are needed in future studies. Furthermore, cases of cloud embedded within aerosol layers should be considered in future studies.

## Acknowledgments

This work was supported by 973 Programs (2011CB707106), and the NSFC (41127901, 10978003).

## References and links

**1. **K. N. Liou, *An Introduction to Atmospheric Radiation* (Academic Press, 2002).

**2. **V. A. Kovalev and W. E. Eichinger, *Elastic Lidar: Theory, Practice, and Analysis Methods* (Wiley-Interscience, 2004).

**3. **M. Feiyue, G. Wei, and M. Yingying, “Retrieving the aerosol lidar ratio profile by combining ground- and space-based elastic lidars,” Opt. Lett. **37**(4), 617–619 (2012). [CrossRef] [PubMed]

**4. **M. Vaughan, D. M. Winker, and K. Powell, “CALIOP algorithm theoretical basis document, part 2: Feature detection and layer properties algorithms,” (NASA Langley Research Center, Hampton, Virginia, USA, 2005).

**5. **J. B. Senberg, A. Ansmann, J. M. Baldasano, D. Balis, C. B. Ckmann, B. Calpini, A. Chaikovsky, P. Flamant, A. Hgrd, and V. Mitev, “EARLINET: a European aerosol research lidar network,” in *Advances in laser remote sensing,* (Selected Papers of the 20th International Laser Radar Conference, 2001), pp. 155–158.

**6. **F. Mao, W. Gong, S. Song, and Z. Zhu, “Determination of the boundary layer top from lidar backscatter profiles using a Haar wavelet method over Wuhan, China,” Opt. Laser Technol. **49**, 343–349 (2013). [CrossRef]

**7. **F. Rocadenbosch, M. Sicard, M. N. M. Reba, and S. Tomas, “Morphological tools for range-interval segmentation of elastic lidar signals,” in IEEE International Geoscience and Remote Sensing Symposium(IGARSS), 2007), 4372~4375. [CrossRef]

**8. **S. R. Pal, W. Steinbrecht, and A. I. Carswell, “Automated method for lidar determination of cloud-base height and vertical extent,” Appl. Opt. **31**(10), 1488–1494 (1992). [CrossRef] [PubMed]

**9. **D. M. Winker and M. A. Vaughan, “Vertical distribution of clouds over Hampton, Virginia observed by lidar under the ECLIPS and FIRE ETO programs,” Atmos. Res. **34**(1-4), 117–133 (1994). [CrossRef]

**10. **Z. Wang and K. Sassen, “Cloud type and macrophysical property retrieval using multiple remote sensors,” J. Appl. Meteorol. **40**(10), 1665–1682 (2001). [CrossRef]

**11. **F. Mao, W. Gong, and C. Li, “Anti-noise algorithm of lidar data retrieval by combining the ensemble Kalman filter and the Fernald method,” Opt. Express **21**(7), 8286–8297 (2013). [CrossRef] [PubMed]

**12. **Y. Morille, M. Haeffelin, P. Drobinski, and J. Pelon, “STRAT: An automated algorithm to retrieve the vertical structure of the atmosphere from single-channel lidar data,” J. Atmos. Ocean. Technol. **24**(5), 761–775 (2007). [CrossRef]

**13. **J. Gaumet, J. Heinrich, M. Cluzeau, P. Pierrard, and J. Prieur, “Cloud-base height measurements with a single-pulse erbium-glass laser ceilometer,” J. Atmos. Ocean. Technol. **15**(1), 37–45 (1998). [CrossRef]

**14. **S. A. Young, “Analysis of lidar backscatter profiles in optically thin clouds,” Appl. Opt. **34**(30), 7019–7031 (1995). [CrossRef] [PubMed]

**15. **F. Mao, W. Gong, J. Li, and J. Zhang, “Cloud detection and coefficient retrieve based on improved differential zero-crossing method for Mie lidar,” Acta Opt. Sin. **30**(11), 3097–3102 (2010). [CrossRef]

**16. **F. Mao, W. Gong, and Z. Zhu, “Simple multiscale algorithm for layer detection with lidar,” Appl. Opt. **50**(36), 6591–6598 (2011). [CrossRef] [PubMed]

**17. **M. A. Vaughan, K. A. Powell, D. M. Winker, C. A. Hostetler, R. E. Kuehn, W. H. Hunt, B. J. Getzewich, S. A. Young, Z. Liu, and M. J. McGill, “Fully automated detection of cloud and aerosol layers in the CALIPSO lidar measurements,” J. Atmos. Ocean. Technol. **26**(10), 2034–2050 (2009). [CrossRef]

**18. **J. R. Campbell, K. Sassen, and E. J. Welton, “Elevated cloud and aerosol layer retrievals from micropulse lidar signal profiles,” J. Atmos. Ocean. Technol. **25**(5), 685–700 (2008). [CrossRef]

**19. **W. Gong, F. Mao, and S. Song, “Signal simplification and cloud detection with an improved Douglas-Peucker algorithm for single-channel lidar,” Meteorol. Atmos. Phys. **113**(1-2), 89 (2011). [CrossRef]

**20. **W. Gong, F. Mao, and J. Li, “OFLID: Simple method of overlap factor calculation with laser intensity distribution for biaxial lidar,” Opt. Commun. **284**(12), 2966–2971 (2011). [CrossRef]

**21. **D. H. Douglas and T. K. Peucker, “Algorithms for the reduction of the number of points required to represent a digitized line or its caricature,” Int. J. Geo. Inf. and Geo. **10**, 112–122 (1973).

**22. **E. Keogh, S. Chu, D. Hart, and M. Pazzani, “An online algorithm for segmenting time series,” in (Proceedings 2001 IEEE International Conference on Data Mining, 2001), 289–296. [CrossRef]

**23. **S. Burton, R. Ferrare, M. Vaughan, A. Omar, R. Rogers, C. Hostetler, and J. Hair, “Aerosol classification from airborne HSRL and comparisons with the CALIPSO vertical feature mask,” Atmos. Meas. Tech. **6**(5), 1397–1412 (2013). [CrossRef]