Water optical clustering based on water color information is important for many ecological and environmental application studies, both regionally and globally. The fuzzy clustering method avoids the sharp boundaries in type-memberships produced by hard clustering methods, and thus presents its advantages. However, to make good use of the fuzzy clustering methods on water color spectra data sets, the determination of the fuzzifier parameter (m) of FCM (fuzzy c-means) is the key factor. Usually, the m is set to 2 by default. Unfortunately, this method assigned some membership degrees to non-belonging water type, failing to obtain the unitarity of cluster structure in some cases, especially in inland eutrophic water. To overcome this shortcoming, we proposed an improved FCM method (namely FCM-m) for water color spectra classification by optimizing the fuzzifier parameter. We collected an inland data set containing 1280 in situ spectral data and co-measured water quality parameters with a wide range of biogeochemical variability in China. Using FCM-m, seven spectrally distinct water optical clusters on Sentinel-3 OLCI (Ocean and Land Colour Imager) bands were obtained with the optimized fuzzifier (m=1.36), and the well-performed clustering result is assessed by the validated index (Fuzzy Silhouette Index=0.513). Also, the FCM-m-based soft classification framework was successfully applied to the atmospherically corrected OLCI images, which was evaluated by previous case studies. Besides, by testing FCM-m on three coastal and oceanic data sets, we verified that the optimized m should be adjusted based on the data set itself, and in general, the value gradually approaches 1 with the increase of the band number (or dimension). Finally, the effect of the improved method was tested by Chlorophyll-a concentration estimation. The results show that the algorithm blending by FCM-m performs better than that by original FCM, which is mainly because the FCM-m reduces the estimation error from non-belonging clusters by a stricter membership value assignation. To sum up, we believe that FCM-m is an adaptive algorithm, whose R codes are available at https://github.com/bishun945, and needs to be tested by more public data sets.
© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
The Case 1 and 2 scheme is commonly used as a way to classify global water types for bio-optical modeling purposes, which was firstly qualitatively defined by Morel and Prieur , depending on the predominance of phytoplankton and other covarying compounds. The idealized concept of Case 1 water promoted the success of the first generation of bio-optical models and, to some extent, advanced the birth of ocean color satellite . However, the development and refinement of satellites or sensors facilitated the ocean color science community to start paying more attention to Case 2 waters with more complex bio-optical properties [3–5]. It is generally accepted that there are no universal model parameters applicable to all water types [6,7]. Given that, Case 2 waters have been furtherly divided into more detailed types based on the trophic gradient , region-based separation (i.e., using a regional algorithm for one certain study area) , or AOPs (apparent optical properties). AOPs summarize the optical properties of water near or above the water surface which could be indirectly derived from satellite records based on the effective atmospheric correction . AOPs are usually considered as the great carrier for water classification [2,11,12] in the form of Forel-Ule index , optical spectrum [12,14] or diffuse attenuation coefficient (Kd) [15,16]. Based on the spectral shape of Kd(λ), Jerlov classified waters into three oceanic types and five coastal types which provides a convenient scheme for describing water clarity . A classic example of AOP difference is shown in Fig. 1, where four China inland lakes present considerable variations in both the magnitude and spectral shape of Rrs (remote sensing reflectance, one of AOPs). As a conclusion, AOP-based water classification has long been popular both in Case 1 and 2 waters, such as its subsequent application for optimizing the best model in one certain bio-optical state water [9,14,18–20], or flag-mask process for satellite maps [21,22].
The AOP-based classification strategies are generally divided into hard clustering and fuzzy clustering (or soft classification), including hard schemes: hierarchical clustering , HCM (Hard C-Means) , and ISODATA (Iterative Self-Organizing Data Analysis Technique) ; and fuzzy schemes: FCM (Fuzzy C-Means) , FPCM (Fuzzy and Possibilistic C-Means) , and UPFC (Unsupervised Possibilistic Fuzzy C-Means) . The application of hard clustering usually results in uneven or discontinuous maps, violating the first law of geography that near things (waters) are more related than distant things (waters) . Conversely, fuzzy clustering utilizes “membership degrees” to explain the ambiguity between different water types . Membership degree is usually deemed as “weight” in algorithm blending schemes, which helps to remove the sharp boundary in maps produced by hard classification schemes [12,29]. Since the comprehensiveness of water information is expressed by AOP or variants of AOP, they were usually used as the input of fuzzy clustering schemes. These scheme inputs are varied, such as raw AOP [12,14], normalized AOP [29,30], spectral slope , spectral depth , spectral angle , spectral ratio , B-spline functions , and CIE-based hue angle .
These applications have provided useful insight into optical water clustering. However, there have been few investigations into the determination of the fuzzifier (or ambiguity) of fuzzy clustering schemes [12,14,19]. Specifically, the fuzzifier parameter m of FCM has not been discussed in optical water clustering. The study by Moore, et al.  tested the effects of different m values in FCM and finally determined the appropriate value by assessing the clustering quality indices. However, a detailed scheme to determine m value is not given, which makes it hard to select an appropriate fuzzifier parameter when facing new data sets. The optical water data set is usually characterized as high dimensional data with multispectral or hyperspectral bands whose satellite sensors are designed for water color. It has been reported that fuzzifier m defaulted to equal 2 in FCM is not suitable for high dimensional data sets and probably makes cluster results more indistinct [31,32]. Meanwhile, algal blooms often occur in inland eutrophic lakes with vegetation-like spectra which should be considered in the optical clustering data set . However, the study of Xue, et al.  indicated that FCM with default fuzzifier did not show better performance compared to HCM when determining the appropriate optical clusters in great lakes of Yangtze and Huai River Basin. Furthermore, Spyrakos, et al.  reported that using HCM scheme could obtain the distinct cluster result covering the bloom spectrum. Therefore, given high dimensionality of spectral data, inappropriate fuzzifier parameter, and necessity of using FCM (rather than HCM), it motivates us to look for a compromise, which can consider both the membership degree provided by fuzzy clustering and ensure that the seeming outlier objects (such as bloom spectra) are strictly divided.
The main objectives of this study are to 1) improve the FCM method to make the optical cluster more effective and applicable for inland water. 2) to analyze the difference between the optical properties of distinct clusters and 3) to assess the clustering results by applying them to atmospherically corrected images in various kinds of inland waters.
2. Data and methods
2.1 Data collection
A dataset of 1280 in situ hyperspectral Rrs from widely-distributed inland waters were used in the clustering analysis (shown in Table 1). The dataset consisted of data from sixteen inland lakes, reservoirs, and rivers with co-measured parameters of the concentration of Chlorophyll-a (CChla) and total suspended matter (CTSM) together with Secchi disk depth (SDD) and the absorption of phytoplankton, non-phytoplankton, and colored dissolved organic matter (namely aph, aNAP and aCDOM) within 350-800 nm.
Specifically, the in situ Rrs was measured by the FieldSpec spectroradiometer (Analytical Spectral Devices, Inc. Boulder, CO, USA) according to the above-water method (see more details in ). The hyperspectral Rrs spectra were re-sampled to 15 bands of OLCI based on the spectral responding function (https://sentinel.esa.int) which excluded water vapor absorption bands (namely 761, 764 and 768 nm) and adjacency-contaminated band (1020 nm, reported in ). This operation was considered as the dimension-reduction to avoid the noise signal from hyperspectral observations to some extent . In situ water samples were collected from the water surface using Niskin bottles that were frozen at −20 °C for laboratory analysis. The value of Chla, TSM, ISM, and OSM could be obtained using the methods of previous studies [36,37]. The aph and aNAP values were obtained using the quantitative filter technique delineated by Yunlin Zhang, et al. . The aCDOM values were obtained by using a Shimadzu UV-2250 in the region of 240-880 nm described by Mitchell, et al. . All measurements were visually examined to avoid obvious errors.
The samples were gathered during multiple cruises, covering a wide range of biogeochemical and optical variability observed during different seasonal conditions from 2006 to 2018. The investigated inland waters range from turbid river-communicating waters such as Lake Hongze , Lake Dongting , and Reservoir Three Gorges , to the shallow and phytoplankton-dominated waters such as Lake Chaohu , Lake Taihu , and Lake Dianchi , and to the deep and low-reflectance waters such as Lake Erhai  and Lake Qiandao . Fortunately, previous case studies focusing on investigated waters have provided plentiful prior knowledge to validate the cluster results quantitatively [33,40,42,44–46].
2.2.1 Clustering algorithms based on fuzzy logic
Hard clustering algorithms, such as K-means, assume that each object belongs to only one cluster. In practice, the object usually belongs to more than one cluster because of the overlap. To solve this problem, the membership degree of one object to clusters usually be quantitated between 0 and 1. For a given object, an index close to 1 represents a strong association with the cluster and vice versa. FCM, proposed by Bezdek , is a well-known clustering algorithm with the main constraint that the sum of membership degree of one object to the entire cluster is equal to 1. For a data set, the membership matrix and cluster centroids can be obtained after the minimization of the total inertia criterion . Meanwhile, as the extension of the basic FCM algorithm, many other fuzzy logic clustering algorithms, such as FPCM  and UPFC , were proposed to overcome the vulnerability to noise and outliers of data sets suffered by FCM. These algorithms were assembled into one package in R “ppclust” for utilization in this study .
Here we introduce the algorithm principle of FCM, while FPCM and UPFC as variants of FCM designed for specific data sets could be referred to in previous studies [25,26,47]. The spectra were initially normalized by dividing their integrals, since the composition of inland waters varies greatly, which changes the reflectance spectral shape rather than the magnitude dominated by water clarity [18,29,30]. Then, the cluster number (K) needs to be defined. The starting centers at each iteration of FCM were selected randomly for their unsupervised clustering process. Thus, we assumed that K was varied from 2 to 20 and repeated clustering 1000 times by randomly bootstrapping over 1150 subsamples (approximately 90% of the whole data) to avoid undue weight to individual spectra. For each process, Partition Entropy (PE), Partition Coefficient (PC), Modified Partition Coefficient (MPC) and Fuzzy Silhouette Index (SIL.F), all available indices for assessing cluster performance in “fclust” R package , were computed and compared to achieve the optimal number of cluster K. Finally, the best cluster number in this study is 7, accounting for ∼89% in all bootstrapping results (see more details in open source codes https://github.com/bishun945). FCM aims to ascertain the most characteristic optical water type and to calculate the membership degree, which is achieved by minimizing the objective function defined as follow:
2.2.2 Determination of fuzzifier parameter m
The fuzziness of FCM is controlled by fuzzifier parameter m and matrix norm A, which has been proven by Dembele . Fuzzifier m can strengthen the fuzziness of FCM by increasing m . Conversely, when m is close to one, the FCM approaches HCM . So parameter m should be given more attention once the norm metric is chosen for calculating the feature distance. Dembele and Kastner also proved the importance of m optimization by introducing a mathematical method for determining the appropriate fuzzifier m to cluster DNA microarray data sets [31,32].
Many researchers have utilized m = 2 based on empirical studies to allow easy computation of membership degree matrix [51,52]. However, m = 2 may not be an appropriate fuzzifier parameter for general data sets, revealed by previous studies [11,31,51,52]. Note that parameter m may seriously affect the fuzziness for one belonging to its center and then affects the subsequent application based on membership degree . In this section, an improved FCM method, denoted as FCM-m, is proposed for the water spectral data with a slight modification of the method by Dembele and Kastner .
The first step of this process is to define the upper bound value for m (mub). For a given data set, there is an mub, above which the membership degrees resulting from FCM are equal to 1/K. The method assumes that the coefficient of variance (cv) of the set of distances between the spectra with m = mub is close to 0.03p, where p is the data dimension shown as follows:
2.2.3 Assessment of the quality of cluster results
The SIL.F was chosen to evaluate the goodness of fuzzy clustering results. Compared to its original silhouette criterion, widely used in crisp (hard) data clustering [7,29], SIL.F has been designed to improve performance in detecting regions with higher data density when clusters overlapped . For each spectrum j, the silhouette width sj is defined as follows: suppose apj is the average dissimilarity between j and all other spectra of the cluster p to which j belongs. For all other cluster centroids C, suppose the feature distance d(j,C) is the average dissimilarity of j to all spectra of C. The minimum value of d(j,C) is bpj, i.e., minC[d(j,C)], and can be seen as the dissimilarity between j and its “neighbor” cluster, namely the nearest one to which it does not belong. Finally, the sj can be calculated as in Eq. (5). In Eq. (6), upj and uqj are the first and second-largest elements of the jth column of the membership degree matrix, respectively, and α ≥ 0 is a weighting coefficient (default: 1) . The SIL.F value lies between −1 and 1. When its value is less than zero, the corresponding spectrum is poorly classified and vice versa [7,32,54].
3.1 Cluster results and properties of optical water types
We observed the distribution of membership degrees via jitter plots when using distinct values. However, i.e., 1.1, 1.36, 2, 3.6 and 5.6, shown in Fig. 3. When m approaches 1 (Fig. 3(a)), the FCM gives less fuzzy membership (i.e., harder) results, similar to those obtained by HCM. While the membership degrees of three algorithms, i.e., FCM, FPCM, and UPFC, with m = 2 were relatively low, most especially for UPFC which failed to extract any clustering structure, indicating algorithms with m = 2 failed to associate any spectrum to any cluster tightly. However, FCM, FPCM, and UPFC with m = 1.36 can obtain appropriate membership distributions in which the majority of the spectra are strongly associated with one given cluster. This is in line with our demand for fuzzy clustering of water spectra which finds a good compromise between the need to assign most spectra to a given cluster, and need to discriminate spectra that classify poorly [7,11,12,32]. When m continues to increase, the membership degree of each cluster is strictly equal to 1/K , such as Fig. 3(g) where m = 5.6. However, from statistical results (not shown here), membership degree with m = 3.6 has already reached the upper limit of the data set. Lastly, considering the stability and simplicity, FCM was chosen to be improved for fuzzy clustering of the water spectra data set.
Given the optimized m=1.36, seven optimized clusters were obtained by utilizing FCM-m with the mean fuzzy silhouette width=0.5135, showing a well-performed clustering structure (Table 3). The distributions of each cluster for aCDOM(440), CChla, CTSM, and SSD were also analyzed, as shown in Fig. 4(i)–4(l). The variation and aggregation of each cluster performed well, especially for CChla and CTSM. However, the distribution of aCDOM(440) is similar in most clusters while the variation of SSD in Cluster 3 is much wider than others. Specifically, Cluster 6 has the strongest variability which presents vegetation-like spectra (Fig. 4(g)) with extremely higher CChla and CTSM, reaching 1000 mg/m3 and 100 mg/L, respectively. Meanwhile, the high absorption of CDOM at 440 nm (from 1.0 to 3.0 m−1) results in low reflectance of Cluster 6. After that, the CChla and CTSM of Cluster 1 are followed by that of Cluster 6 which has a weak reflectance peak at 709 nm. Similar to Cluster 1, the optical clarity, indicated by SSD, of Cluster 2 is higher due to smaller CTSM, around 32 mg/L, and higher CChla, reaching 100 mg/m3. Thus, Cluster 1 and 2 are considered as productive waters. The spectra of Cluster 2 show more obvious peaks and higher reflectance at near-infrared region (NIR) which is optically neighbor to Cluster 6. Cluster 3 and 4 belong to relatively clean inland water types, showing lower-reflected spectra (Fig. 4(d) and 4(e)), lower CChla (nearing 10 mg/m3), and less absorption of CDOM at 440 nm. Nevertheless, Cluster 3 is much cleaner since CTSM of Cluster 3 and 4 are lower and higher than 10 mg/L, respectively. Finally, Cluster 5 and 7 are deemed to be turbid water, dominated by sediments, with a high concentration of TSM reaching 100 mg/L but low CChla (around 10 mg/m3), with Cluster 7 being cleaner. Overall, a brief description and example waters of each cluster are provided in Table 2. The detailed spectra value of cluster centers with OLCI bands are shown in Data File 1.
3.2 Satellite image application in case studies
Maps of water optical clusters and membership degrees could be obtained by FCM-m with corrected image spectral reflectance as input. In this section, three representative lakes, namely Lake Taihu, Lake Hongze, and Lake Erhai, were selected for testing FCM-m, since they present high variability of water color which has been extensively reported in previous studies [35,40,44,55–57]. The selected OLCI images were atmospherically corrected prior to FCM-m by utilizing ACbTC (Atmospheric Correction based on Turbidity Classification) method designed for complex inland waters (see more details about imagery process in our previous study ). The ACbTC method achieved full-band average values of the mean absolute percentage error (MAPE) = 29.55%, providing the relatively accurate water spectral magnitude and shape as input for FCM-m .
3.2.1 Lake Taihu
The true color map of Lake Taihu on July 24, 2017 captured great spatial heterogeneity of water color (Fig. 5(a)). Visually, the algal blooms, occurring in the western bay, were dark green. However, the center of Lake Taihu is darker while the southwest and south are lighter, deemed as more turbid.
Visual observations are similar to the results presented by FCM-m, shown in Fig. 5(b). Specifically, Cluster 1 and 7 are mainly distributed in the northern bay, namely Meiliang Bay, characterized as a turbid and productive area. The clustering result of FCM-m presents the more spatial structure of details in the complexly distributed west area. The water clusters are distributed as a ladder from near-shore regions to the lake center, i.e., from Cluster 6 to Cluster 1 and 2, and then to Cluster 3 and 4. Moreover, the color difference located at the two northwestern estuaries could also be observed. After that, Cluster 1 and 7, termed as turbid waters, dominated the southern off-shore areas. Collectively, the membership maps display a more “strict” clustering result, which guarantees the belonging of in-cluster members to one given cluster and that strictly reject objects (namely pixels with spectral information) outside the cluster. An appropriate example, shown in Fig. 5(h) is that membership degrees of Cluster 6 in Lake Taihu could be regarded as the result of HCM to some extent, that is, the membership is a bipolar distribution of 0 and 1. However, the edges of some clusters show medium membership, around 0.5, such as Fig. 5(f) and 5(i). These results are what we desire from the fuzzy logical clustering method since there is, inevitably, a fuzzy state belonging to two clusters (or more) when one transition to another. In conclusion, the clustering result of FCM-m for the Lake Taihu OLCI image is reasonable based on our prior knowledge of Lake Taihu . Meanwhile, FCM-m can also achieve a compromise between over-soft clustering and HCM.
3.2.2 Lake Hongze
The second case of Lake Hongze shows typical water color patterns of spatial distribution, described by Cao et al. (2017), that both the estuary and lake center are turbid . Correspondingly, these regions are divided into Cluster 5 and 7, as shown in Fig. 6(b). However, the productive turbid waters, characterized as Cluster 1 and 2, were distributed in western areas which contain macrophytes and higher concentrations of phytoplankton and CDOM than other parts of Lake Hongze . Relatively clean waters, Cluster 3 and 4, could be found in calm bays where re-suspension of sediments is weak due to the topographic and environmental conditions . Additionally, we obtained pixels classified as Cluster 6 at two rivers in the south, but do not think they are pure water signals as narrow rivers are vulnerable to stray light from adjacent land or vegetation .
3.2.3 Lake Erhai
Lake Erhai is considered as a clean inland lake with low reflectance , presenting as black in the true color map, shown in Fig. 7(a). The lake is mainly divided into Cluster 3 and 4, located in the central and southern parts (Fig. 7(b)). Meanwhile, Cluster 5 and 7 are located in the northern part of Lake Erhai. This could be explained by the high wind speed in the north, resulting in sediment resuspension with higher CTSM . However, Cluster 7 in the south is mainly caused by runoff from the west of Lake Erhai, which is consistent with the observation of the river plume in our previous study . Thus, Lake Erhai is not a highly productive lake as there is no apparent high-value distribution in membership maps of Cluster 1, 2, and 6 (Fig. 7(c), 7(d) and 7(h)). The water classified as Cluster 6 around the lakeshore is likely caused by submerged plants on shallow terrain, and of course, the adjacent effect is not excluded .
4.1 Is fuzzifier m equal to 2 in FCM appropriate for water optical data sets?
Although few studies have discussed the fuzzifier parameter m in water optical clustering, Moore, et al.  studied the influence of different m values of FCM on clustering results based on 159 coastal spectra, and revealed that the quality of the cluster begins to decline with m equal to 2 or more. In fact, Moore’s finding is consistent with our results that three candidate fuzzy clustering methods presented softer membership distribution with m ≥ 2, as shown in Fig. 3. Therefore, is it reasonable to default if m value should be 2 in fuzzy clustering methods for water optical classification? According to our research, we think the m value should be adjusted with the data set. The reasons are as follows. First, we recognize that m = 2 can be simplified in the process of the FCM algorithm which can also cause consistent fuzzy results when dealing with low-dimensional data sets, as reported by previous studies [31,32,47,51,52]. However, in some complex conditions of the FCM application, such as high-dimensional data of DNA or spectrum [32,53], the result of m = 2 is too soft, which leads to an object having to give more memberships to other not-belonging categories. This is not conducive to the FCM application, for instance, retrieval algorithms blending [12,19,20,29], although the clustering may not change. Second, when spectrally vegetation-like waters are mixed into the training data set, it is found that their spectra could not be separated by FCM with m = 2. Undoubtedly, the occurrence of algal blooms in eutrophic lakes is very common, even with reports of large-scale blooms [19,22,33], and as an optical water type in the inland aquatic system, we believe that it should be included in the clustering results. However, spectra of water blooms could not be found in previous studies using fuzzy cluster (or soft classification) methods, results of which, as far as we consider, were attributed to the inappropriate m value. Nevertheless, the clustering method proposed by Spyrakos, et al.  with HCM could obtain these bloom spectra. Finally, we assessed the proposed FCM-m based on several published data sets [58,59] and AERONET-OC data, shown in Table 3. The result indicates that the value of mub should only depend on the data set itself, and then be transformed to mused for FCM utilization. Also, we found a general conclusion that the appropriate mused was closer to 1 when abnormal samples (such as bloom waters) were incorporated into the data set and the dimension (or bands) increased.
The failure of using the single algorithm for OWC retrieval has inspired the idea of water optical classification. However, these classification strategies seem to pay more attention to the accuracy of the final blended algorithm [7,12,14,20]. The membership degree of one transect at Lake Taihu (Fig. 5) is shown in Fig. 8 by FCM with m = 1.36 and m = 2. It is evident that the result of FCM with m = 2 (as the default set of previous studies) is unreasonable since it weakens the membership of the in-cluster member and enlarges the membership of other clusters. On the other hand, FCM with m = 2 delays the gradual transition from one cluster to another, as shown in Fig. 8(b), where the membership of Cluster 6 (the bloom water type) approaches zero until the pixels are 25. Notably, the transitional water areas were considered to share similar membership degrees, with their intersections from one cluster to another. There is no doubt that the deviation cumulated by clusters with the second-highest membership degree (often called neighbor) should be taken into account since they contribute greatly to the final algorithm blending, especially for pixels near water blooms in eutrophic lakes.
4.2 Can FCM-m improve the Chla estimation by blending algorithms?
Two widely used CChla estimation algorithms, namely Band-Ratio (BR) and Three-Band algorithm (TBA), were selected to test whether the blending algorithms based on FCM-m improve the retrieval result. Specifically, we used equations that have been parameterized by Gilerson, et al.  from coastal and inland water systems. After that, the optimal algorithm for each optical water cluster was selected by evaluating the CChla performance, which has been elaborated by previous studies and the algorithms blending was based on the membership matrix from FCM-m using fuzzifier parameter m equal to 1.36 and 2 (see Moore, et al.  and references therein).
Here we selected MRPE (Median Relative Percent Error), bias and MAE (Median Absolute Error) as error metrics for assessing the performance of the two algorithms. Recommended by Seegers, et al. , all Chla concentration was log-transformed before calculating error metrics. As shown in Table 4, TBA performed better in productive waters (Cluster 1 and 2) while BR obtained lower MRPE, bias, and MAE in relatively clean waters (Cluster 3 and 4). In the context of turbid waters (Cluster 5 and 7), the improvement is considerably different, that TBA is a lot better than BR in Cluster 5. However, regarding Cluster 6 waters with extremely high Chla concentration, both TBA and BR present higher statistical error values than that of other clusters (MRPE > 20%). This is mainly due to the lack of CChla estimation algorithm for Cluster 6. Given that the blending strategy inevitably induces the error from the non-optimized algorithm to the final result, what we desire is to reduce the redundant error. In other words, we must pay attention to the effect of huge errors in some water types (such as Cluster 6) on the final blending results. The blended results with m = 1.36 have obtained the more acceptable error metrics of each cluster and total clusters than that with m = 2, which mainly because FCM-m reduces the error from poor-performance types by stricter membership assignment. As reported by previous studies , the relationship of the absorption of yellow matter on model using different bands is important to accurately estimate Chla concentrations. The properties of productive waters, clean waters, and turbid waters present a large variety which makes it plausible to apply the specific algorithm in a certain water type for satisfying different model assumptions. Thus, there is no need to propose a universal CChla algorithm at present, but rather to improve the accuracy of certain water types (such as Cluster 5 and 7 with low CChla but high turbidity or Cluster 6 with extremely high CChla).
4.3 Comparison of different water optical clustering results
In this study, only Rrs spectra from inland waters were selected for FCM training rather than putting open ocean, coast, and inland waters together. We think this is reasonable, just as in the study of Spyrakos, et al. , which subdivided the Case 2 water into inland systems and coastal systems. The water optical clusters provided in this study complements the global optical water type, however, it does not include the type of “blue-water” reported by the previous study [30,61], since this type of water is not found in our investigated inland systems, which may be an inadequacy of our data set. Nevertheless, further refinement of this method is necessary to justify the classification scheme when more data are considered. Indeed, the FCM-m method is capable of distinguishing the blue-water spectra by controlling the fuzzifier parameter, which may occur in future studies. We believe that FCM-m is an adaptive algorithm since it depends on the input data self and glad to make FCM-m codes public to be next improved in the community.
By comparing the clustering results with those of the study reported by Moore, et al. , shown in Fig. 9, several similar clusters were found, although they came from different training samples. Meanwhile, no low-reflecting water type (such as TM2 in Fig. 9) was found in the clustering results of this study, mainly because the input of FCM was first normalized. Eleveld, et al.  supposed that normalization of spectra as input performs better in deep, low-reflecting clear lakes and non-normalization performs better in high-reflecting lakes with high sediment load. The normalization or non-normalization of input samples leads to different concerns for cluster results, as reported by Jackson, et al.  and Vantrepotte, et al. , so clusters obtained in this study may neglect some parts of low-reflecting inland waters to some extent. However, due to the dominance of phytoplankton in clean water (not in blue-water) , spectral shapes of these waters are still represented in our results, such as the water types clustered as Cluster 3 and 4 in Lake Erhai and Lake Qiandao.
4.4 Advantages, expansibility and limitations of FCM-m
We suppose that the advantages of this study are as follows: first, the FCM method is used in water optical clustering, which can provide membership degree compared with the HCM and obtain more reasonable spatial separation of water types in nature; second, the obtained water optical cluster contains hypereutrophic waters (i.e., the spectrally vegetation-like type) which is more in line with the actual situation of inland systems; third, the fuzzifier parameter of FCM is optimized which preserves high membership degree of belonging clusters. This study could provide more reasonable flag mask work for satellite data pre-processing based on membership degree in future applications. After that, the membership degree could be transformed into the weight of algorithms for blending framework which has improved the estimation performance based on parameterized algorithms.
Considering the expansibility and stability of FCM-m to other optical sensors, we re-sampled hyperspectral Rrs spectra to other six sensors and then used as the input of FCM with fuzzifier m in a predefined range (1.1, 1.3, 1.5, 1.7, 2.0, 2.5, 3.0, 4.0, and 5.0). The 6 sensors are OLI (Operational Land Imager onboard Landsat-8 with 5 available bands), VIIRS (Visible Infrared Imaging Radiometer, Suomi NPP, 7), GOCI (Geostationary Ocean Color Imager, COMS, 8), MSI (Multispectral Instrument, Sentinel-2, 9), MODIS (Aqua/Terra, 13), and MERIS (Medium Resolution Imaging Spectrometer, ENVISAT, 13). Note that, here the cluster number was fixed as 7, same as the OLCI-result, which helps us focus on the effect of data set dimension (or band numbers) and fuzzifier parameter m to FCM results (Fig. 10). The optimized mused increases with the decrease of the band number which is consistent with the result of Yu, et al. . It is believed that for a better clustering structure (or membership distribution), the reduced m value is recommended for the remote sensing spectra with high dimensions (bands), same to DNA microarray data .
However, there are still some limitations to this study. Given that water optical clustering is a fit-for-purpose scheme, the mused optimized by FCM-m may not be the best (a finer step in section 2.2.2 could optimize more accurate values). Furthermore, SIL.F can only evaluate the clustering structure , while ignoring the actual benefit of the membership degree, which requires some quantitative evaluation criteria for judging the rationality of the parameters. Nevertheless, after fine-tuning the mused parameter in different data sets (not shown here), we found that the distribution of membership degree did not change greatly, which indicates that using FCM-m to calculate m value is practical at present. Besides, the water optical clustering is very sensitive to the input data quality [12,14,62]. First, in situ spectra should strictly follow the same measurement method so as to ensure that the results of water optical clustering are contributed to the differences of water properties, rather than the systematic errors from methods or devices. On the other hand, when applying to satellite images, the uncertainty of atmospheric correction can also cause failed clustering results, as reported by previous studies [12,21]. It may be necessary to establish a fuzzy clustering framework based on Rayleigh-corrected Rrs. Its feasibility lies in that aerosol scattering reflectance could be added to the in situ spectra through radiation transfer models or established lookup tables .
At present, fuzzy clustering has been widely used in water optical classification with the default fuzzifier parameter. However, the preset value might not be suitable for all types of water bodies. Therefore, we proposed an improved FCM method, namely FCM-m, for water optical clustering. The new method optimizes the fuzzifier parameter m based on upper bound m (mub) depends on the input data set. Through a widely-distributed in situ data set from China inland waters, we have generated seven representative water optical clusters by using FCM-m. Meanwhile, the method was evaluated on atmospherically corrected image scenes from the low-reflected to sediment/phytoplankton-dominated waters. Our results can be considered as an extension of the current global optical water type used for inland waters since we took into account the spectrally vegetation-like water type. By testing three additional oceanic and coastal data sets, we found that a reduced m is recommended for fuzzy clustering and the optimized m gradually approaches to 1 with the increase of band number. Further, thanks to the stricter membership assignation rule of FCM-m, the performance of algorithms blending based on FCM-m is better than that on the original FCM method. Finally, we believe that FCM-m is an adaptive algorithm, whose R codes are available at https://github.com/bishun945, and needs to be tested by more public data sets.
National Key R&D Program of China (2017YFB0503902); National Natural Science Foundation of China (41671340, 41701412, 41701423); Major Science and Technology Program for Water Pollution Control and Treatment (2017ZX07302-003); Postgraduate Research & Practice Innovation Program of Jiangsu Province (KYCX18_1205).
We greatly appreciate Prof. Dick Brus, Dr. Thomas Jackson, and Dr. Doulaye Dembele to provide constructive suggestions about the fuzzy clustering method. We would also like to thank the three anonymous reviewers for their careful readings of the text and for many useful suggestions.
1. A. Morel and L. Prieur, “Analysis of variations in ocean color 1,” Limnol. Oceanogr. 22(4), 709–722 (1977). [CrossRef]
2. C. D. Mobley, “Optical modeling of ocean waters: Is the case 1-case 2 classification still useful?” Oceanogr 17(2), 60–67 (2004). [CrossRef]
3. B. N. Seegers, R. P. Stumpf, B. A. Schaeffer, K. A. Loftin, and P. J. Werdell, “Performance metrics for the assessment of satellite data products: an ocean color case study,” Opt. Express 26(6), 7404–7422 (2018). [CrossRef]
4. J. Lin, H. Lyu, S. Miao, Y. Pan, Z. Wu, Y. Li, and Q. Wang, “A two-step approach to mapping particulate organic carbon (POC) in inland water using OLCI images,” Ecological Indicators 90, 502–512 (2018). [CrossRef]
5. K. Xue, R. Ma, H. Duan, M. Shen, E. Boss, and Z. Cao, “Inversion of inherent optical properties in optically complex waters using sentinel-3A/OLCI images: A case study using China's three largest freshwater lakes,” Remote Sens. Environ. 225, 328–346 (2019). [CrossRef]
6. D. Sun, Y. Li, Q. Wang, C. Le, C. Huang, and K. Shi, “Development of optical criteria to discriminate various types of highly turbid lake waters,” Hydrobiologia 669(1), 83–104 (2011). [CrossRef]
7. K. Xue, R. Ma, D. Wang, and M. Shen, “Optical Classification of the Remote Sensing Reflectance and Its Application in Deriving the Specific Phytoplankton Absorption in Optically Complex Lakes,” Remote Sens. 11(2), 184 (2019). [CrossRef]
8. Y. Zhang, Y. Zhou, K. Shi, B. Qin, X. Yao, and Y. Zhang, “Optical properties and composition changes in chromophoric dissolved organic matter along trophic gradients: Implications for monitoring and assessing lake eutrophication,” Water Res. 131, 255–263 (2018). [CrossRef]
9. C. Hu, Z. Lee, and B. Franz, “Chlorophyll aalgorithms for oligotrophic oceans: A novel approach based on three-band reflectance difference,” J. Geophys. Res.: Oceans 117(C1), 117 (2012). [CrossRef]
10. G. Zheng and P. M. DiGiacomo, “Uncertainties and applications of satellite-derived coastal water quality products,” Prog. Oceanogr. 159, 45–72 (2017). [CrossRef]
11. T. S. Moore, J. W. Campbell, and M. D. Dowell, “A class-based approach to characterizing and mapping the uncertainty of the MODIS ocean chlorophyll product,” Remote Sens. Environ. 113(11), 2424–2430 (2009). [CrossRef]
12. T. S. Moore, M. D. Dowell, S. Bradt, and A. R. Verdu, “An optical water type framework for selecting and blending retrievals from bio-optical algorithms in lakes and coastal waters,” Remote Sens. Environ. 143, 97–111 (2014). [CrossRef]
13. J. Pitarch, H. J. van der Woerd, R. J. Brewin, and O. Zielinski, “Optical properties of Forel-Ule water types deduced from 15 years of global satellite ocean color observations,” Remote Sens. Environ. 231, 111249 (2019). [CrossRef]
14. T. Jackson, S. Sathyendranath, and F. Mélin, “An improved optical classification scheme for the Ocean Colour Essential Climate Variable and its applications,” Remote Sens. Environ. 203, 152–161 (2017). [CrossRef]
15. N. G. Jerlov, “Classification of sea water in terms of quanta irradiance,” ICES J. Mar. Sci. 37(3), 281–287 (1977). [CrossRef]
16. N. G. Jerlov and F. F. Koczy, Photographic measurements of daylight in deep water (Elanders boktr., 1951).
17. M. G. Solonenko and C. D. Mobley, “Inherent optical properties of Jerlov water types,” Appl. Opt. 54(17), 5392–5401 (2015). [CrossRef]
18. C. Le, Y. Li, Y. Zha, D. Sun, C. Huang, and H. Zhang, “Remote estimation of chlorophyll a in optically complex waters based on optical classification,” Remote Sens. Environ. 115(2), 725–737 (2011). [CrossRef]
19. F. Zhang, J. Li, Q. Shen, B. Zhang, C. Wu, Y. Wu, G. Wang, S. Wang, and Z. Lu, “Algorithms and Schemes for ChlorophyllaEstimation by Remote Sensing and Optical Classification for Turbid Lake Taihu, China,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 8(1), 350–364 (2015). [CrossRef]
20. M. E. Smith, L. R. Lain, and S. Bernard, “An optimized Chlorophyll a switching algorithm for MERIS and OLCI in phytoplankton-dominated waters,” Remote Sens. Environ. 215, 217–227 (2018). [CrossRef]
21. M. Eleveld, A. Ruescas, A. Hommersom, T. Moore, S. Peters, and C. Brockmann, “An optical classification tool for global lake waters,” Remote Sens. 9(5), 420 (2017). [CrossRef]
22. M. W. Matthews and D. Odermatt, “Improved algorithm for routine monitoring of cyanobacteria and eutrophication in inland and near-coastal waters,” Remote Sens. Environ. 156, 374–382 (2015). [CrossRef]
23. K. Shi, Y. Li, Y. Zhang, L. Li, H. Lv, and K. Song, “Classification of inland waters based on bio-optical properties,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 7(2), 543–561 (2014). [CrossRef]
24. F. Mélin and V. Vantrepotte, “How optically diverse is the coastal ocean?” Remote Sens. Environ. 160, 235–251 (2015). [CrossRef]
25. N. R. Pal, K. Pal, and J. C. Bezdek, “A mixed c-means clustering model,” in Proceedings of 6th International Fuzzy Systems Conference, (IEEE, 1997), 11–21.
26. X. Wu, B. Wu, J. Sun, and H. Fu, “Unsupervised possibilistic fuzzy clustering,” J. Comput. Sci. 7, 1075–1080 (2010).
27. W. R. Tobler, “A computer movie simulating urban growth in the Detroit region,” Economic Geography 46, 234–240 (1970). [CrossRef]
28. L. A. Zadeh, “Fuzzy sets,” Information and Control 8(3), 338–353 (1965). [CrossRef]
29. V. Vantrepotte, H. Loisel, D. Dessailly, and X. Mériaux, “Optical classification of contrasted coastal waters,” Remote Sens. Environ. 123, 306–323 (2012). [CrossRef]
30. E. Spyrakos, R. O’Donnell, P. D. Hunter, C. Miller, M. Scott, S. G. Simis, C. Neil, C. C. Barbosa, C. E. Binding, and S. Bradt, “Optical types of inland and coastal waters,” Limnol. Oceanogr. 63(2), 846–870 (2018). [CrossRef]
31. D. Dembele, “Multi-objective optimization for clustering 3-way gene expression data,” Adv. Data Anal. Classif. 2(3), 211–225 (2008). [CrossRef]
32. D. Dembele and P. Kastner, “Fuzzy C-means method for clustering microarray data,” Bioinformatics 19(8), 973–980 (2003). [CrossRef]
33. M. Mu, C. Wu, Y. Li, H. Lyu, S. Fang, X. Yan, G. Liu, Z. Zheng, C. Du, and S. Bi, “Long-term observation of cyanobacteria blooms using multi-source satellite images: a case study on a cloudy and rainy lake,” Environ. Sci. Pollut. Res. 26(11), 11012–11028 (2019). [CrossRef]
34. J. L. Mueller, C. Davis, R. Arnone, R. Frouin, K. Carder, Z. Lee, R. Steward, S. Hooker, C. D. Mobley, and S. McLean, “Above-water radiance and remote sensing reflectance measurements and analysis protocols,” Ocean Optics protocols for satellite ocean color sensor validation Revision 2, 98–107 (2000).
35. S. Bi, Y. Li, Q. Wang, H. Lyu, G. Liu, Z. Zheng, C. Du, M. Mu, J. Xu, S. Lei, and S. Miao, “Inland Water Atmospheric Correction Based on Turbidity Classification Using OLCI and SLSTR Synergistic Observations,” Remote Sens. 10(7), 1002 (2018). [CrossRef]
36. K. Shi, Y. Li, L. Li, and H. Lu, “Absorption characteristics of optically complex inland waters: Implications for water optical classification,” J. Geophys. Res. Biogeosci. 118(2), 860–874 (2013). [CrossRef]
37. Z. Zheng, J. Ren, Y. Li, C. Huang, G. Liu, C. Du, and H. Lyu, “Remote sensing of diffuse attenuation coefficient patterns from Landsat 8 OLI imagery of turbid inland waters: a case study of Dongting Lake,” Sci. Total Environ. 573, 39–54 (2016). [CrossRef]
38. Y. Zhang, E. Zhang, and M. Liu, “Spectral absorption properties of chromophoric dissolved organic matter and particulate matter in Yunnan Palteau lakes,” J. Lake Sci. 21(2), 255–263 (2009). [CrossRef]
39. B. G. Mitchell, M. Kahru, J. Wieland, M. Stramska, and J. Mueller, “Determination of spectral absorption coefficients of particles, dissolved material and phytoplankton for discrete water samples,” Ocean optics protocols for satellite ocean color sensor validation Revision 3, 231 (2002).
40. Z. Cao, H. Duan, L. Feng, R. Ma, and K. Xue, “Climate-and human-induced changes in suspended particulate matter over Lake Hongze on short and long timescales,” Remote Sens. Environ. 192, 98–113 (2017). [CrossRef]
41. Z. Zheng, Y. Li, Y. Guo, Y. Xu, G. Liu, and C. Du, “Landsat-based long-term monitoring of total suspended matter concentration pattern change in the wet season for Dongting Lake, China,” Remote Sens. 7(10), 13975–13999 (2015). [CrossRef]
42. X. Hou, L. Feng, H. Duan, X. Chen, D. Sun, and K. Shi, “Fifteen-year monitoring of the turbidity dynamics in large lakes and reservoirs in the middle and lower basin of the Yangtze River, China,” Remote Sens. Environ. 190, 107–121 (2017). [CrossRef]
43. K. Xue, Y. Zhang, R. Ma, and H. Duan, “An approach to correct the effects of phytoplankton vertical nonuniform distribution on remote sensing reflectance of cyanobacterial bloom waters,” Limnol. Oceanogr.: Methods 15(3), 302–319 (2017). [CrossRef]
44. K. Shi, Y. Zhang, Y. Zhou, X. Liu, G. Zhu, B. Qin, and G. Gao, “Long-term MODIS observations of cyanobacterial dynamics in Lake Taihu: Responses to nutrient enrichment and meteorological factors,” Sci. Rep. 7(1), 40326 (2017). [CrossRef]
45. B. Matsushita, W. Yang, G. Yu, Y. Oyama, K. Yoshimura, and T. Fukushima, “A hybrid algorithm for estimating the chlorophyll-a concentration across different trophic states in Asian inland waters,” ISPRS J. Photogramm. and Remote Sensing 102, 28–37 (2015). [CrossRef]
46. Y. Zhang, K. Shi, Y. Zhang, M. J. Moreno-Madriñán, G. Zhu, Y. Zhou, and X. Yao, “Long-term change of total suspended matter in a deep-valley reservoir with HJ-1A/B: implications for reservoir management,” Environ. Sci. Pollut. Res. 26(3), 3041–3054 (2019). [CrossRef]
47. J. C. Bezdek, Pattern recognition with fuzzy objective function algorithms (Springer Science & Business Media, 2013).
48. R Development Core Team, R: Probabilistic and Possibilistic Cluster Analysis, 2019.
49. R Development Core Team, R: Fuzzy Clustering, 2019.
50. W. Wang and Y. Zhang, “On fuzzy cluster validity indices,” Fuzzy Sets and Systems 158(19), 2095–2117 (2007). [CrossRef]
51. J. Yu, Q. Cheng, and H. Huang, “Analysis of the weighting exponent in the FCM,” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 34(1), 634–639 (2004). [CrossRef]
52. M. R. Rezaee, B. P. Lelieveldt, and J. H. Reiber, “A new cluster validity index for the fuzzy c-mean,” Pattern Recognit. Lett. 19(3-4), 237–246 (1998). [CrossRef]
53. T. S. Moore, J. W. Campbell, and H. Feng, “A fuzzy logic classification scheme for selecting and blending satellite ocean color algorithms,” IEEE T. Geosci. Remote Sensing 39(8), 1764–1776 (2001). [CrossRef]
54. R. J. Campello and E. R. Hruschka, “A fuzzy extension of the silhouette width criterion for cluster analysis,” Fuzzy Sets and Systems 157(21), 2858–2875 (2006). [CrossRef]
55. S. Bi, Y. Li, H. Lu, L. Zhu, M. Mu, S. Lei, S. Wen, and X. Ding, “Estimation of chlorophyll-a concentration in Lake Erhai based on OLCI data,” J. Lake Sci. 30(3), 701–712 (2018). [CrossRef]
56. Z. Cao, H. Duan, M. Shen, R. Ma, K. Xue, D. Liu, and Q. Xiao, “Using VIIRS/NPP and MODIS/Aqua data to provide a continuous record of suspended particulate matter in a highly turbid inland lake,” Int. J. Appl. Earth Obs. Geoinf. 64, 256–265 (2018). [CrossRef]
57. X. Han, L. Feng, X. Chen, and H. Yesou, “MERIS observations of chlorophyll-a dynamics in Erhai Lake between 2003 and 2009,” Int. J. Remote Sensing 35(24), 8309–8322 (2014). [CrossRef]
58. B. Nechad, K. Ruddick, T. Schroeder, K. Oubelkheir, D. Blondeau-Patissier, N. Cherukuru, V. Brando, A. Dekker, L. Clementson, and A. C. Banks, “CoastColour Round Robin data sets: a database to evaluate the performance of algorithms for the retrieval of water quality parameters in coastal waters,” Earth Syst. Sci. Data 7(2), 319–348 (2015). [CrossRef]
59. A. Valente, S. Sathyendranath, V. Brotas, S. Groom, M. Grant, M. Taberner, D. Antoine, R. Arnone, W. M. Balch, and K. Barker, “A compilation of global bio-optical in situ data for ocean-colour satellite applications,” Earth Syst. Sci. Data 8(1), 235–252 (2016). [CrossRef]
60. A. A. Gilerson, A. A. Gitelson, J. Zhou, D. Gurlin, W. Moses, I. Ioannou, and S. A. Ahmed, “Algorithms for remote estimation of chlorophyll-a in coastal and inland waters using red and near infrared bands,” Opt. Express 18(23), 24109–24125 (2010). [CrossRef]
61. M. Hieronymi, D. Müller, and R. Doerffer, “The OLCI Neural Network Swarm (ONNS): A bio-geo-optical algorithm for open ocean and coastal waters,” Front. Mar. Sci. 4, 140 (2017). [CrossRef]
62. J. Wei, Z. Lee, and S. Shang, “A system to measure the data quality of spectral remote-sensing reflectance of aquatic environments,” J. Geophys. Res.: Oceans 121, 8189–8207 (2016).
63. H. Liu, S. Hu, Q. Zhou, Q. Li, and G. Wu, “Revisiting effectiveness of turbidity index for the switching scheme of NIR-SWIR combined ocean color atmospheric correction algorithm,” Int. J. Appl. Earth Obs. Geoinfor. 76, 1–9 (2019). [CrossRef]