## Abstract

Abstract: Layer boundary (base and top) detection is a basic problem in lidar data processing, the results of which are used as inputs of optical properties retrieval. However, traditional algorithms not only require manual intervention but also rely heavily on the signal-to-noise ratio. Therefore, we propose a robust and automatic algorithm for layer detection based on a novel algorithm for lidar signal segmentation and representation. Our algorithm is based on the lidar equation and avoids most of the limitations of the traditional algorithms. Testing of the simulated and real signals shows that the algorithm is able to position the base and top accurately even with a low signal to noise ratio. Furthermore, the results of the classification are accurate and satisfactory. The experimental results confirm that our algorithm can be used for automatic detection, retrieval, and analysis of lidar data sets.

© 2015 Optical Society of America

## 1. Introduction

Aerosols and clouds not only play key roles in global and local climates but are also two of the largest uncertainty sources in climate sensitivity research [1]. Among the various instruments used to measure aerosol and cloud properties, lidar is a particularly active probe used to measure the atmospheric vertical profile [2]. Satellite lidars and lidar networks have been established in wide-reaching studies on local and global climatologies [3, 4 ]. Therefore, developing practical and effective algorithms for lidar data processing is important to unlock the potential of larger lidar data sets.

The accurate determination of aerosol and cloud layers is one of the first and most significant steps of lidar signal processing. Generally, the principle of layer detection is to discriminate layers from clear air based on the change in the raw signal *P*(*r*), where *r* is the detection range, or its derivative signals, such as the range-corrected signal *P*(*r*)*r*
^{2}, the atmospheric extinction coefficient *α*
_{p}(*r*), or the attenuated backscatter coefficients, which reflect the change in the physical and optical properties of aerosol and cloud layers. Automatic layer detection has been a challenge for many years [2, 5, 6
]. Kovalev, et al. [7] stated that the boundaries of layers could be easily selected by “eyeball” identification based on the temporal and spatial changes in the lidar signals. The results of “eyeball” identification were subjectively better than those of previous automatic algorithms, which suggested that developing new automatic algorithms are very necessary.

The two main problems in the detection of aerosol and cloud layers are the issues of determining the base and top of a layer. A potential problem with the previous algorithm in detecting layers is that it uses a multipoint detection window and a subjective threshold, which tend to decrease the vertical resolution and cause errors. For instance, Pal, et al. [5] defined the layer base using the slope of the signal, which is determined based on a least squares fit with a multipoint window. The layer base is selected when the number of consecutive increasing bins exceeds a certain threshold. Furthermore, Young [8] used the lidar signal together with the backscatter of a known atmosphere to devise a two-independent-window approach that could be used to detect the base and top of a cloud. The cloud base is detected using the ratio of the real signal to the extrapolated signal exceeding a threshold for another threshold number of bins. Many other similar algorithms have been proposed, but they need manual intervention to tune the window size or threshold [9–12 ]. Wang and Sassen [12] combined lidar with radar, and the combination yielded satisfying cloud detection results. However, this combined approach is expensive and cannot be used to detect optically thin layers. Furthermore, Wang and Sassen [12] reported that no universal algorithm for layer detection can be used for all conditions. Several algorithms have since been proposed, and these algorithms will be discussed in the subsequent paragraphs.

Morille, et al. [13] used the wavelet method to identify the local modulus maxima on the left and right of the layer peak as the layer base and top, respectively. This method can work well when the signal-to-noise ratio (SNR) is low, but the local modulus maxima correspond to the “fastest changing” bins, which are not exactly the layer base and top because changes in the signal near the layer base and top are usually more inconspicuous than those in other parts. An algorithm is proposed by reorganizing the lidar signals in ascending order to reduce the signal variation based on the histogram equalization method after smoothing using the semidiscretization processing method, which results in the loss of details of the signal [14].

The simple multiscale algorithm proposed by Mao, et al. [6] can easily determine the base with comparable accuracy to the “eyeball” technology when the SNR is acceptable. However, similar to most previous algorithms, this algorithm defines the layer top height as the first range bin above the layer base with *P*(*r*)*r*
^{2} ≤ *P*(*r*
_{b})*r*
_{b}
^{2} [5, 10
], where *r*
_{b} is the range of the base. This definition neglects the attenuation of the layers. Therefore, deriving the layer transmittance, optical depth, and lidar ratio based on the identified base and top layers is difficult. Segmenting a lidar signal into piecewise sub-signals accurately can be the key to effective solutions for layer detection. Gong, et al. [15] and Mao, et al. [16] used piecewise linear representation to segment the *P*(*r*)*r*
^{2} for cloud detection and optical property retrieval. However, linear representation is problematic in detection because the lidar signal is nonlinear.

We present a new algorithm based on a novel algorithm for lidar signal segmentation to overcome the limitations of previous algorithms. Our algorithm does not need to select a window size or a threshold for the number of increasing bins. Because the noise of *P*(*r*)*r*
^{2} is *e*(*r*)*r*
^{2}, where *e*(*r*) is the noise of *P*(*r*), an algorithm based on *P*(*r*)*r*
^{2} has noise increasing problem [17]. However, *P*(*r*) is utilized in our proposed algorithm, which avoids the noise increasing problem of *P*(*r*)*r*
^{2} in some previous algorithms. The algorithm is able to position the base and top accurately because of accurate and efficient segmentation. We present the results of the layer detection experiments using the algorithm for real lidar data and simulated data with an extensive range of noise levels. The algorithm seems to be good in identifying the region of a layer. The layer classification results are satisfactory based on the signal ratio of the layer peak and base.

## 2. Principles and methods

#### 2.1 Lidar equation

A general lidar equation that only considers a single scatter can be written as follows [2]:

*r*is the detection range;

*P*(

*r*) is the lidar signal, the background noise of which has been removed;

*c*is the lidar constant;

*G*(

*r*) is the overlap factor, which is the ratio of the energy detected by the detector to the energy hitting the telescope mirror [18];

*A*is the mirror area of the telescope;

*T*

_{1}

^{2}is the two-way transmittance from the lidar system to the range

*r*

_{1};

*β*(

*r*) is the atmospheric backscattering coefficient;

*α*(

*r*) is the atmospheric extinction coefficient; and

*e*(

*r*) is the noise that is considered Gaussian.

For an ideal homogeneous atmosphere, the full overlap range is from *r*
_{1} to *r*
_{2}, where *α*(*r*), *β*(*r*), and *G*(*r*) are constants. The ideal signal *P _{i}*(

*r*) can be described as follows:

*C*=

*cGβAT*

_{1}

^{2}and

*α*are constants obtained using the least squares fitting for

*P*(

*r*). Furthermore,

*C*and

*α*can be approximately obtained from the signal of the first and end range bins using the following equations:

Then, we use *P _{s}*(

*r*),

*C*, and

_{s}*α*(

_{s}*r*) to represent the estimates of

*P*(

*r*),

*C*, and

*α*(

*r*), respectively, based on the signal of the first and end range bins. We also use

*P*(

_{f}*r*),

*C*, and

_{f}*α*(

_{f}*r*) to represent the estimates of

*P*(

*r*),

*C*, and

*α*(

*r*), respectively, based on the least squares fitting. In Eq. (4), given that $-2({r}_{2}-{r}_{1})$ is negative, we can deduce that

*α*(

_{f}*r*) is positive when the signal decreases (i.e.,

*P*(

*r*

_{2})

*r*

_{2}

^{2}is less than

*P*(

*r*

_{1})

*r*

_{1}

^{2}); otherwise,

*α*(

_{f}*r*) is quasi and negative.

#### 2.2 Flow of the layer detection algorithm

An algorithm is proposed for the layer detection of a lidar signal based on the Eq. (2). Only the signal of the full overlap region is processed in the detection. The algorithm comprises the following four steps:

- (1) Signal segmentation: An algorithm is proposed to segment a given signal
*P*(*r*) with a given threshold. The algorithm yields an optimal representation so that the maximum error for any segment does not exceed a given threshold*d*, which we will discuss later._{thr} - (2) Extinction coefficient fitting: The extinction coefficients can be approximately obtained from the segmentation algorithm as
*α*(_{s}*r*). However, we apply nonlinear least squares fitting to yield a more precise estimate of the extinction coefficient,*α*(_{f}*r*), based on the segmentation for subsequent layer detection. - (3) Layer base, top, and peak detection: We define the base-to-peak region (BPR) as the region with a negative
*α*(_{f}*r*). The layer base and peak are the first and last bins of the BPR. We define the first range bin, where*α*(_{f}*r*) is close to the extinction of clear air as the layer top. Furthermore, we refine the layer base (top) by using an “extrapolate into the layer” strategy. - (4) Layer classification: We classify each range bin as belonging to an aerosol or cloud layer using 4 as the threshold based on
*P*(*r*_{p})*r*_{p}^{2}/*P*(*r*_{b})*r*_{b}^{2}of the connected layers as Morille, et al. [13].*P*(*r*_{p})*r*_{p}^{2}and*P*(*r*_{b})*r*_{b}^{2}are the signals of the layer peak and base, respectively.

#### 2.3 Piecewise lidar signal representation and segmentation

A signal (curve) can be segmented into linear piecewise subsignals (subcurves) for further analysis [19, 20
]. This idea have been applied for cloud detection and optical property retrieval with lidar [15, 16, 21
] Generally, one measured lidar signal can be approximately considered the *k* piecewise sub-lidar signals of “homogeneous atmospheres”. Segmenting a lidar signal into *k* piecewise sub-lidar signals accurately can be the key to effective solutions for data processing, such as layer detection. In our previous study, we used piecewise linear representation to segment the range-corrected signal *P*(*r*)*r*
^{2} [15]. However, the noise level of *P*(*r*)*r*
^{2} is equivalent to *e*(*r*)*r*
^{2}, which increases with the increasing range. Moreover, a *P*(*r*)*r*
^{2} signal for a homogeneous atmosphere is exponential. Therefore, we segment *P*(*r*) instead of *P*(*r*)*r*
^{2} based on the lidar equation expressed in Eq. (2) instead of a linear function.

Briefly, the *P*(*r*) segmentation problem can be described as follows: We begin with a given lidar signal *P*(*r*) with *n* range bins and corresponding simplified signal *P _{s}*(

*r*). Given the tolerance

*d*of the intensity difference between the simplified and original signals, we derive the range bin

_{thr}*p*and the maximum difference

_{m}*d*. The signal is then segmented into the left and right parts if

_{m}*d*exceeds

_{m}*d*until the maximum error of any segment does not exceed

_{thr}*d*. In practice, the recursive algorithm proceeds as follows:

_{thr}Function [break_bins] = NonlinearSeg [*P*(1:*n*), *σ*]

- • obtain
*C*employing Eq. (3);_{s} - • obtain
*α*employing Eq. (4);_{s} - •
*d*(1:*n*) = |*P*(1:*n*) −*P*(1:_{s}*n*) |; // obtain the absolute difference between*P*and*P*_{s} - • [
*d*,_{m}*d*_bin] = max[_{m}*d*(1:*n*)]; // find*d*and its corresponding range bin_{m} - • ∆
*P*= 0.05mean[*P*(1:*n*)]; // the definition of ∆*P*will be discussed later - •
*d*= ∆_{thr}*P*+ 6*σ*; // obtain*d*_{thr} - • If
*d*>_{m}*d*_{thr}· // recursively segment the left part if necessary

· [break_bins1] = NonlinearSeg [

*P*(1:*d*_bin),_{m}*σ*];· // recursively segment the right part if necessary

· [break_bins2] = NonlinearSeg [

*P*(*d*_bin + 1:end),_{m}*σ*];· // combine the break bins of the left and right parts

· break_bins = [break_bins1(1:end − 1), break_bins2(1:end)];

- • Else
· // the break bins are the first and end bins of

*P*if segmentation is unnecessary· break_bins = [

*P*(1),*P*(*n*)]; - • end if

end Function

Universally, in a range with an ideal homogeneous atmosphere, 99% of the difference between *P*(*r*) and *P _{s}*(

*r*) can be considered to be in the range ± 3

*σ*, where

*σ*represents the standard deviation of

*e*(

*r*). We compute

*σ*at a high altitude and consider leaving

*e*(

*r*) only. Given that the break bins we determined are generally the local minimum and maximum of

*P*(

*r*) in segmentation, 99% of the difference between

*P*(

*r*) and

*P*(

_{s}*r*) can be considered to be the corresponding range of 0 to 6

*σ*. Therefore, theoretically, we can simply define

*d*as 6

_{thr}*σ*. However, in practice, the

*P*(

*r*) signal is strong and the SNR is high in the near range. Even a small change (e.g., 1%) in the atmospheric optical properties can cause a larger change in

*P*(

*r*) than 6

*σ*. Nevertheless, the signal will be segmented from a few bins to a few other bins if we apply 6

*σ*as

*d*because the atmospheric optical properties change dramatically in the near range. Accordingly, many small changes are detected as layers, thus resulting in a low detection efficiency. However, we ignore the small changes and only focus on the dramatic changes. Therefore, tolerance of the change in the atmospheric optical properties, ∆

_{thr}*P*, can be considered in segmentation. We define

*d*as follows:

_{thr}*P*can be defined as zero to a few percentages of the mean of

*P*(

*r*) of the current segment. In this study, we defined ∆

*P*as 5% of the mean

*P*(

*r*) of the current segment, but selecting an appropriate ∆

*P*for a specific application is the users’ own task, because which depends on how small the variation a specific application needs to detect.

An example of the first segmentation step in the segmentation of a real measured signal (excluding the affected overlap range) using the proposed algorithm is shown in Fig. 1
. In Fig. 1(a), *d _{m}* that is equal to 67.36 is obtained at the altitude of 675 m, and it is larger than

*d*. The signal is then segmented into the left and right parts, as shown in Fig. 1(b).

_{thr}*P*(

_{s}*r*) values of the left and right parts shown in Fig. 1(b) are closer to

*P*(

*r*) than to the first whole

*P*in Fig. 1(a). The recursive segmentation of the left and right parts continues until

_{s}*d*for any segment is less than

_{m}*d*.

_{thr}The extinction coefficient can be obtained using the segmentation algorithm expressed in Eq. (4). However, *α _{s}*(

*r*) is coarse because it is obtained from the start and end bins of a segment, which are generally extreme values. Therefore, we apply nonlinear least squares fitting to obtain a more precise estimate of the extinction coefficient based on the segmentation result of the subsequent layer detection [22]. Formally, we derive

*α*and

_{f}*C*that minimize Δ for every piecewise segment as follows:

_{f}*∆r*is the vertical resolution. In nonlinear least squares fitting, we use

*C*and

_{s}*α*as the initial values of

_{s}*C*and

_{f}*α*, respectively, to save time during the iteration.

_{f}#### 2.4 Layer base, peak and top detection

In previous studies, slope of a lidar signal has been mostly be utilized for detecting a layer because which is straightforward for representing the signal change [5, 6, 8, 10, 12, 23 ]. A layer base is detected when the slope changes from negative to positive. However, the slope of the signal is inaccurate for detecting layers because a lidar signal is affected by the inverse square law and transmittance. Therefore, which may mis-reflect the change in the physical and optical properties of aerosol and cloud layers, e.g. a small slope may correspond to a large extinction coefficient and vice versa. For instance, we simulated a lidar signal with a layer as shown in Fig. 2(a) for illustrating the difference of the slope and extinction in lidar layer detection. The layer cannot be detected by a slope method because the signal is continually decreasing, and the slope is always negative as shown in Fig. 2(b). However, the layer can be determined based on the extinction coefficient by comparing with the referential extinction derived from clear air as shown in Fig. 2(c). Thus, an extinction based method is expected to better respond to the change in the atmospheric optical properties than a slope based method.

For the nonlinear segmentation algorithm, the fitted extinction coefficient *α _{f}*(

*r*) is generally positive for a region where the signal decreases, such as the clear region and peak-to-top layer region. By contrast,

*α*(

_{f}*r*) is quasi and negative for the BPR. Therefore, we can easily define the BPR as the region where

*α*(

_{f}*r*) is negative and identify the first and last bins of the BPR as the base and peak of a layer, respectively. The break bins we identified on the basis of

*P*(

*r*) are generally the range bins, the ln[

*P*(

*r*)

*r*

^{2}] values of which are the local minimum and maximum of the segmentation. Therefore, we consider that the manner in which our algorithm interprets the layer base is the same as that in previous studies [5, 9, 12, 24 ]. However, our algorithm derives the range bins, the ln[

*P*(

*r*)

*r*

^{2}] values of which are the local minima according to nonlinear fitting and can be considered the layer base bin. Our algorithm is better than those used in previous studies, as those studies select the range bins of the local minima of

*P*(

*r*) as the layer base bin [5, 10, 13 ]. However, the increase in the signal close to the base is subdued in

*P*(

*r*) by the 1/

*r*

^{2}effect [6, 12 ].

Many studies selected the layer top above the layer base, where *P*(*r*)*r*
^{2} ≤ *P*(*r*
_{b})*r*
_{b}
^{2} [5, 6, 10
]. The definition of the layer top is coarse because the attenuation of the layer is disregarded. Nevertheless, deriving the layer transmittance and the optical depth based on the identified base and top layers is difficult because the definition of these layers is generally based on the ratio of the *P*(*r*)*r*
^{2} of the base and top. However, we search and define the first range bin, in which *α _{f}*(

*r*) is close to the extinction coefficient of ideal clear air as the layer top above the primary layer top. The extinction of ideal clear air is computed based on the Rayleigh scattering theory with US standard atmosphere. We initially search the layer top from the first range bin, where

*P*(

*r*)

*r*

^{2}≤

*P*(

*r*

_{b})

*r*

_{b}

^{2}, to avoid selecting the bins around the layer peak,

*α*(

_{f}*r*), which can be the same as that of clear air as the layer top. The criteria for selection of the layer top are adequate not only because the

*α*(

*r*) of a layer is always many times greater than that of clear air but also because

*α*(

_{f}*r*) of the regions forming the first bin of

*P*(

*r*)

*r*

^{2}≤

*P*(

*r*

_{b})

*r*

_{b}

^{2}of the layer top are generally many orders of magnitude larger than that of clear air. The nonlinear fitting method misinterprets the signal decrease (caused by the backscatter decrease) mainly because of the attenuation of a larger extinction coefficient [21]. This misinterpretation is caused by assuming

*β*(

*r*) is constant during the fitting.

Considering, the thicker a layer is, the easier a layer can be detected. In the following, we only discuss the requirement for detecting an optical and geometric thin layer. Assuming *C*/*r*
_{b}
^{2}·*β _{m}*(

*r*

_{b})·

*T*

^{2}(

*r*

_{b}) is equal to

*C*/

*r*

_{p}

^{2}·

*β*(

_{m}*r*

_{p})·

*T*

^{2}(

*r*

_{p}), where

*T*

^{2}(

*r*) is the two-way transmittance at

*r*. the difference

*d*of

*P*(

*r*

_{p}) and

*P*(

*r*

_{b}) can be written as

*C*/

*r*

_{p}

^{2}·

*β*(

_{p}*r*

_{p})·

*T*

^{2}(

*r*

_{p}) ± 6

*σ*, and a layer will be detected when

*d*is larger than

*d*

_{thr}. Theoretically, ignoring the ∆

*P*, we have no difficult to detect a layer when [16]:

*β*

_{p}(

*r*

_{p}) is the particle backscatter coefficient at

*r*

_{p}. The accuracy of the base (top) detection depend on how fast is the

*β*(

_{p}*r*) increases (decreases) from zero (from the left part of Eq. (7)) to the left part of Eq. (7) (to zero).

The initially selected bases and tops are generally extreme values that are sensitive to noise and cause larger uncertainty. We use the “extrapolate into the layer” strategy proposed previously to overcome this problem [16]. We extrapolate the fitted signals of the nearest “clear” segmentations of the layer into the layer up to the layer peak. We refine the estimate of the layer base (top) by examining *P*(*r*) from the peak to the lower (higher) region until which is less than the extrapolated signal. The refining process is iterative, and we refine the base/top until they match the former ones. Notably, the selected top can be artificial when the laser beam is unable to penetrate the optically thick layer, because we simply select the first bin that has no signal (only noise) as the top [6]. This finding implies biases in statistics based on the determined cloud top height that should be carefully used in studies related to statistics.

#### 2.5 Layer classification

The cloud and aerosol layers cause sharp signal changes. However, the dramatic changes in the aerosol layer are generally weaker than those in the cloud layer. The intensity of the dramatic changes can simply be represented by the signal peak-to-base ratio (ratio of the layer peak and base) defined as follows [12]:

where*P*(

*r*

_{p})

*r*

_{p}

^{2}and

*P*(

*r*

_{b})

*r*

_{b}

^{2}are the range-corrected signals of the layer peak and base, respectively. The peak-to-base ratio has been successfully used to classify layers in previous studies [6, 13, 15 ]. In this study, we also selected the value of 4 as the threshold for layer classification, and this value is based on a detailed statistical analysis and Bayesian classification theory. Therefore, if the peak-to-base ratio of a layer is less than 4, the layer is an aerosol layer; otherwise, the layer is a cloud layer. Notably, we consider all layers above 7.5 km as clouds, as we hypothesize that aerosol layers do not occur at that height [13]. We use the average peak-to-base ratio of connected layers for classification as in previous studies to obtain more robust results [13]. Layers with

*R*greater than 4 can be safely classified as clouds because only a few optically thick dust aerosols have been observed and reported around our lidar site. Nevertheless, for an area where optically thick aerosols are frequently observed, the threshold of

*R*should be carefully selected on the basis of the local data set.

## 3. Results and discussion

#### 3.1 Testing with a measured signal

We tested a lidar signal with complex layer information measured by a 532 nm lidar over Wuhan on December 25 and 26, 2008. The temporal and vertical resolutions of our lidar were 3 min and 3.75 m, respectively. Figures 3(a) and 3(c)
show the segments obtained using the linear segmentation algorithm [16] and nonlinear segmentation algorithm, respectively. The segmentation in Figs. 3(a) and 3(c) show that the lower the SNR is, the more data points a segment will contain. Therefore, we can consider that the segmenting is adaptive, and tends to combine as many homogeneous bins as possible in one segmentation, which is beneficial to robust fitting to optimize the layer boundary and ensure detection. Figure 3(b) is the fitted slope of the linear algorithm, and it shows that the clear air region is close to zero. The extinction coefficient *α _{f}*(

*r*), obtained using nonlinear least squares fitting based on the segmentation results, is shown in Fig. 3(d). Figure 3(d) shows that

*α*(

_{f}*r*) of clear air is close to zero and

*α*(

_{f}*r*) of a layer is larger. However, the variation is flatter than the slope of the linear segmentation. The sharper variation of the slope is mainly due to the lidar signal that obeys the inverse square law and is affected by transmittance. However,

*α*(

_{f}*r*) does not follow the inverse square law. The flatter variation of

*α*(

_{f}*r*) helps in layer detection and data retrieval. The three regions with low

*α*(

_{f}*r*) values can be considered clear and homogeneous. By contrast, negative or large

*α*(

_{f}*r*) values are due to the turbid atmosphere of these regions. If

*α*(

*r*) increases, then the nonlinear fitting will yield negative

*α*(

_{f}*r*). Selecting an ideal region based on

*α*(

_{f}*r*) is easy because the unreasonably large positive

*α*(

_{f}*r*) is greater than that of the ideal atmosphere. Figure 3(f) shows the four layers detected by comparing the original signal and the extrapolation line. The four layer boundaries are consistent with those of the linear methods and “eyeball” identification. We can define regions A and B as aerosol layers and region C as a cloud layer based on the peak-to-base ratio. The optimization strategy of the algorithm does not optimize the layer top at approximately 1 km, and the initial layer top is considered the final layer top. Moreover, small- and large-sized layers have been successfully identified, and all overdetections have been removed effectively. In fact, region A and all its sub-regions belong to the atmospheric boundary layer and entrainment zone and can be classified as an aerosol layer because aerosol is dominant in these regions. However, we only detected and classified region A as the aerosol layer to test our algorithm in this study. We will use our algorithm to detect other regions in future works.

#### 3.2 Testing with simulated signals

We conducted extensive simulation studies to further compare the performances of the three types of algorithms. We simulate four groups of signals by combining the pure signal with Gaussian noise and zero mean but with different standard deviations. The pure signal is generated based on a standard atmospheric condition with a Gaussian peak layer at 4 km to 5 km. The optical thickness and lidar ratio of the layer are 0.014 and 20 sr, respectively. The standard deviations of the noise array of Figs. 4(b)–4(d)
are two, three, and four times that in Fig. 4(a). Figure 4 shows 100 test results under four different noise levels. In the figure, black, green, and orange denote the statistics of the simple multiscale algorithm, the linear segmentation algorithm, and the nonlinear algorithm, respectively. The statistics show that all the three algorithms exhibit satisfactory accuracy, although the layer base (top) of each is overestimated (underestimated), as the signal enhancement near the layer base and top is inconspicuous and is easily covered by noise. Figures 4(a)–4(d) show that the random error and deviation increase with the decrease in SNR. The accuracies of base detection of the three algorithms are close. However, for the layer top detection, the underestimation of the multiscale algorithm is more serious than that of the other two algorithms because the multiscale algorithm simply selects points when *P*(*r*)*r*
^{2} ≤ *P*(*r*
_{b})*r*
_{b}
^{2} as layer top, which ignores the attenuation of the layer. However, the two segmentation algorithms can process layer attenuation well because of the contribution of the “extrapolate into the layer” strategy. Figures 4(a)–4(d) show that the layer top detection of the nonlinear segmentation algorithm has a smaller deviation than that of the linear segmentation algorithm because the extinction coefficient of the nonlinear fitting algorithm can better respond to the change in the atmospheric optical properties than the slope of the linear fitting algorithm. The linear segmentation algorithm is based on the slope, which may mis-determine a range bin with small slope as a “layer top”, but a small slope may correspond to a large extinction coefficient. However, the nonlinear segmentation algorithm can correctly determine that the signal belongs to the layer based on the fitted extinction coefficient. Figures 4(a)–4(d) show that the detection accuracies of the two segmentation algorithms are close to each other because the information of the layer top is covered by high-level noise. Therefore, the advantages of the nonlinear segmentation algorithms cannot be fully performed. We suggest that the algorithm be used for de-noised when the noise level is low.

#### 3.3 Testing with a sequence of real signals

We tested the performance of three algorithms with 10 h data containing complex layer information observed with 532 nm lidar on December 25 and 26, 2008 in Wuhan, as shown in Fig. 5 . Clouds are observed at 8 km to 11 km and 2 km to 4 km, and the boundary layer covers less than 2 km. As shown in Fig. 5, after 22:00, when an optically thick layer is present, the lidar detection range is limited, because a lidar usually cannot penetrate a layer with optical thickness larger than 3. Figure 5(a) shows the aerosol (gray curve) and cloud (black curve) boundaries detected using the simple multiscale algorithm. Figures 5(b) and 5(c) show the results of the linear and nonlinear segmentation algorithms, respectively. Figure 5 shows that these three algorithms satisfactorily detect the layers. The results are consistent with each other, and the accuracy is comparable with the visual result. The result of the multiscale and linear segmentation methods may be slightly different from our previous result [16] because we updated the preprocessing methods in our lidar package.

The scatterplots in Fig. 6
show the layer base (blue “o”) and the layer top (red “+”) detected using the simple multiscale algorithm and the two segmentation algorithms, respectively. If the layer peak of the two algorithms is the same, then we regard them as the same layer. Figure 6 shows the regression line of the layer base (top) of the three algorithms. The regression lines of the base match the “1:1” line well. However, Fig. 6(a) shows that the regression line of the layer top is below the “1:1” line, as the linear segmentation algorithm can process the attenuation of optically thick clouds at 3 km to 4 km by linear fitting, but the simple multiscale algorithm cannot. Furthermore, the nonlinear segmentation algorithm can better process the attenuation of optically thick clouds at 3 km to 4 km by fitting the extinction coefficient. Figure 6(c) shows that the regression line of the layer top is lower than the “1:1” line compared with that in Fig. 6(a). The same result is shown in Fig. 6(e). Figure 6 shows that a layer base less than 2 km exhibits good consistency mainly because the aerosol layer is geometric and optically thin, and can be neglected. However, Figs. 5(d) and 5(f) show that a layer base greater than 7 km exhibits better consistency than the lower layers mainly because the effects of layer attenuation have been covered by the high-level noise. When there is *P*(*r*)*r*
^{2} ≤ *P*(*r*
_{b})*r*
_{b}
^{2} between Layer peaks, the simple multiscale algorithm separates one layer with multiple peaks to more than one layer, even if no clear atmosphere exists among between layer peaks, which are the source of the outliers of regions ABCD highlighted in Fig. 6. Furthermore, layer detection of the linear segmentation algorithm is based on the slope. In several situations, the algorithm divides a single layer into two layers, as a small slope may correspond to a layer with a larger extinction coefficient. However, nonlinear segmentation fitting can avoid this problem because it is based on the extinction coefficient instead of on the slope. The difference between the two types of segmentation algorithms can be the source of outliers in regions E and F, as shown in Fig. 6.

## 4. Summary

Precisely detecting the cloud and aerosol layers in lidar signal has been considered a challenge for many years. We proposed a layer detection algorithm based on a novel algorithm using the lidar equation to overcome this problem. Our algorithm overcomes several limitations of traditional algorithms, which usually need to select a window size or a threshold of the number of increasing bins. We presented the results of applying the algorithms to test their performance in layer detection using the measured and simulated lidar data with an extensive range of noise levels.

Given that lidar signals are nonlinear, linear segmentation is not sufficiently accurate for layer detection. Thus, we used the lidar equation instead of the linear function during segmentation based on the *P*(*r*) signal rather than on the *P*(*r*)*r*
^{2} signal. The detection results show that we can derive the position of the layer base and top, and successfully detect layers in general. Experiments using simulated and real signals show that our algorithm can detect layers with an acceptable noise level. However, in the final two simulated cases with low SNR, the improvement is inconspicuous. We suggest that low SNR signals should be time-averaged or de-noised before layer detection. The nonlinear segmentation algorithm can consider layer attenuation, and the segmentation results have the potential for retrieving layer transmittance, lidar ratio, and optical thickness.

In this study, only signals at 532 nm have been used as examples. However, our algorithm can be applied for a lidar signal at arbitary wavelength, of which the referential extinction need to be computed based on the Rayleigh scattering theory with the corresponding wavelength. In this study, layer detection is based on the segmentation algorithm. Given that this segmentation algorithm is pioneering work, the potential of using the algorithm in more applications of lidar data processing should be examined. Improving or proposing new segmentation algorithms is required in future studies. More statistical analyses should also be conducted. Additional features of signals should be tested in the classification. Furthermore, the multiple scattering effects should be considered in the detection of optically thick layers.

## Acknowledgments

This work was supported by the National Natural Science Foundation of China (NSFC) (41127901), the Program for Innovative Research Team in University of Ministry of Education of China (IRT1278), the National Science Foundation of Hubei Province (2015CFA002), the Major Projects of Hubei Collaborative Innovation Center for High-efficiency Utilization of Solar Energy (HBSZD2014002), the China Postdoctoral Science Foundation (2015M570667), and the Fundamental Research Funds for the Central Universities (2042015kf0015).

## References and links

**1. **K. N. Liou, *An Introduction to Atmospheric Radiation* (Academic Press, 2002).

**2. **V. A. Kovalev and W. E. Eichinger, *Elastic Lidar: Theory, Practice, and Analysis Methods* (Wiley-Interscience, 2004).

**3. **J. B. Senberg, A. Ansmann, J. M. Baldasano, D. Balis, C. Backmann, B. Calpini, A. Chaikovsky, P. Flamant, A. Hgrd, and V. Mitev, “EARLINET: a European aerosol research lidar network,” in Advances in Laser Remote Sensing (SPIE, 2001), pp, 155–158.

**4. **D. M. Winker, J. R. Pelon, and M. P. McCormick, “The CALIPSO mission: Spaceborne lidar for observation of aerosols and clouds,” Proc. SPIE **4893**, 1–11 (2003). [CrossRef]

**5. **S. R. Pal, W. Steinbrecht, and A. I. Carswell, “Automated method for lidar determination of cloud-base height and vertical extent,” Appl. Opt. **31**(10), 1488–1494 (1992). [CrossRef] [PubMed]

**6. **F. Mao, W. Gong, and Z. Zhu, “Simple multiscale algorithm for layer detection with lidar,” Appl. Opt. **50**(36), 6591–6598 (2011). [CrossRef] [PubMed]

**7. **V. A. Kovalev, J. Newton, C. Wold, and W. M. Hao, “Simple algorithm to determine the near-edge smoke boundaries with scanning lidar,” Appl. Opt. **44**(9), 1761–1768 (2005). [CrossRef] [PubMed]

**8. **S. A. Young, “Analysis of lidar backscatter profiles in optically thin clouds,” Appl. Opt. **34**(30), 7019–7031 (1995). [CrossRef] [PubMed]

**9. **D. M. Winker and M. A. Vaughan, “Vertical distribution of clouds over Hampton, Virginia observed by lidar under the ECLIPS and FIRE ETO programs,” Atmos. Res. **34**(1-4), 117–133 (1994). [CrossRef]

**10. **F. Mao, W. Gong, J. Li, and J. Zhang, “Cloud detection and coefficient retrieve based on improved differential zero-crossing method for mie lidar,” Acta Opt. Sin. **30**(11), 3097–3102 (2010). [CrossRef]

**11. **J. R. Campbell, K. Sassen, and E. J. Welton, “Elevated cloud and aerosol layer retrievals from micropulse lidar signal profiles,” J. Atmos. Ocean. Technol. **25**(5), 685–700 (2008). [CrossRef]

**12. **Z. Wang and K. Sassen, “Cloud type and macrophysical property retrieval using multiple remote sensors,” J. Appl. Meteorol. **40**(10), 1665–1682 (2001). [CrossRef]

**13. **Y. Morille, M. Haeffelin, P. Drobinski, and J. Pelon, “STRAT: An automated algorithm to retrieve the vertical structure of the atmosphere from single-channel lidar data,” J. Atmos. Ocean. Technol. **24**(5), 761–775 (2007). [CrossRef]

**14. **C. Zhao, Y. Wang, Q. Wang, Z. Li, Z. Wang, and D. Liu, “A new cloud and aerosol layer detection method based on micropulse lidar measurements,” J. Geophys. Res. **119**, 6788–6802 (2014).

**15. **W. Gong, F. Mao, and S. Song, “Signal simplification and cloud detection with an improved Douglas-Peucker algorithm for single-channel lidar,” Meteorol. Atmos. Phys. **113**(1-2), 89–97 (2011). [CrossRef]

**16. **F. Mao, W. Gong, and T. Logan, “Linear segmentation algorithm for detecting layer boundary with lidar,” Opt. Express **21**(22), 26876–26887 (2013). [CrossRef] [PubMed]

**17. **C. Li, Z. Pan, F. Mao, W. Gong, S. Chen, and Q. Min, “De-noising and retrieving algorithm of Mie lidar data based on the particle filter and the Fernald method,” Opt. Express **23**(20), 26509–26520 (2015). [CrossRef]

**18. **W. Gong, F. Mao, and J. Li, “OFLID: Simple method of overlap factor calculation with laser intensity distribution for biaxial lidar,” Opt. Commun. **284**(12), 2966–2971 (2011). [CrossRef]

**19. **D. H. Douglas and T. K. Peucker, “Algorithms for the reduction of the number of points required to represent a digitized line or its caricature,” Int. J. Geo. Inf. Geo. **10**, 112–122 (1973).

**20. **E. Keogh, S. Chu, D. Hart, and M. Pazzani, “An online algorithm for segmenting time series,” in Proceedings of the 2001 IEEE International Conference on Data Mining (IEEE, 2001), 289–296. [CrossRef]

**21. **F. Mao, W. Wang, Q. Min, and W. Gong, “Approach for selecting boundary value to retrieve Mie-scattering lidar data based on segmentation and two-component fitting methods,” Opt. Express **23**(11), A604–A613 (2015). [CrossRef] [PubMed]

**22. **F. Rocadenbosch, A. Comerón, and L. Albiol, “Statistics of the slope-method estimator,” Appl. Opt. **39**(33), 6049–6057 (2000). [CrossRef] [PubMed]

**23. **F. Mao, W. Gong, S. Song, and Z. Zhu, “Determination of the boundary layer top from lidar backscatter profiles using a Haar wavelet method over Wuhan, China,” Opt. Laser Technol. **49**, 343–349 (2013). [CrossRef]

**24. **F. Rocadenbosch, M. Sicard, M. N. M. Reba, and S. Tomas, “Morphological tools for range-interval segmentation of elastic lidar signals,” in *IEEE International Geoscience and Remote Sensing Symposium* (IGARSS, 2007), pp. 4372–4375. [CrossRef]