Data augmentation using continuous conditional generative adversarial networks for regression and its application to improved spectral sensing

Yuhao Zhu; Haoyu Su; Pengsheng Xu; Yuxin Xu; Yujie Wang; Chun-Hua Dong; Jin Lu; Zichun Le; Xiaoniu Yang; Xiaoniu Yang; Qi Xuan; Qi Xuan; Chang-Ling Zou; Hongliang Ren; Hongliang Ren

doi:10.1364/OE.502709

1. Introduction

Spectroscopic techniques are used in a wide range of fields, including chemistry, physics, biology, environmental science, and materials science [1]. Some common spectroscopic techniques include UV–Vis spectroscopy [2], infrared (IR) spectroscopy [3], Raman spectroscopy [4,5], mass spectrometry [6] and nuclear magnetic resonance (NMR) spectroscopy [7] etc. Spectroscopic techniques generate vast amounts of data, which can be challenging to analyze and interpret using traditional methods. In recent years, big data and deep learning approaches have been increasingly applied to the analysis of spectroscopic data, enabling faster and more accurate analysis and interpretation of the data [8–16]. However, obtaining high-quality training data can be expensive and difficult, especially for a specialized spectroscopic method. The reasons include the following: (1) collecting spectral data can be time-consuming and costly from a diverse range of sources or domains [17]; (2) some spectral data may be subject to privacy regulations, making it difficult or impossible to obtain [18]; (3) some spectral sensing tasks may require specialized knowledge or expertise to label or annotate data accurately [19]; (4) it can be challenging to ensure the quality of the spectral data, particularly if the spectral data may contain errors, bias, or noise, which can impact the performance of the prediction models [20]. Overall, the cost and difficulty of obtaining high-quality training spectral data are important factors to consider when developing machine learning models in spectroscopic techniques.

Among several methods, data augmentation is one of the most powerful techniques to mitigate these challenges [21,22]. It is a technique of artificially increasing the training set by creating additional synthetic examples using existing real data. Using generative models for data augmentation can be a powerful technique for improving the performance of machine learning models, especially in cases where there is limited training data available [23,24]. However, it is important to ensure that the generated examples are realistic and do not introduce biases or distortions into the training data [25,26]. These generative models can be used to generate new examples of data that are similar to the original data, mainly including variational autoencoders (VAEs) and generative adversarial networks (GANs). For VAEs, one potential shortcoming is that they may not always produce high-quality samples, especially if the latent space is not well structured or if the decoder network is not sufficiently powerful [27,28]. Additionally, VAEs may not be able to capture all of the complex variations in the data, especially if the data have a high degree of complexity or variability. One advantage of using GANs for data augmentation is that they can generate highly realistic and diverse examples of the original data [23–26]. GANs consist of two networks: a generator and a discriminator [24]. The generator network takes random noise as input and generates new data samples, while the discriminator network takes both real and generated data samples as input and tries to distinguish between them. The generator and discriminator networks are trained together in an adversarial fashion, where the generator tries to generate more realistic samples to fool the discriminator, and the discriminator tries to become better at distinguishing between real and generated samples.

In recent research, the use of GAN-generated data for enhancing classification performance in spectrum sensing is a promising approach that has shown considerable success. Wang et al. proposed a spectral generation method for extending the spectral database of laser-induced breakdown spectroscopy based on GANs, and the accuracy of the classification model has typically been improved by adding GAN-generated data into the training set [29]. A boundary equilibrium GAN (BEGAN) has been proposed to generate synthetic spectra with high quality and diversity, and the use of these synthetic spectra has improved the predictive performance [30]. Recently, the use of GAN-generated data has shown promise in enhancing the performance of machine learning models for Raman spectra classification [31–33]. The basic idea is to use the GAN to generate synthetic Raman spectra that mimic the statistical characteristics of the real Raman spectra. These synthetic spectra can be used to augment the training set and improve the performance of the classification model. In brief, current applications in data augmentation have mainly focused on spectral classification problems. Few concerns are paid to a spectra regression problem, which has many applications in fields such as chemistry, biochemistry, and materials science [34–36]. In this case, spectral data are used to predict continuous output variables. Typical applications include quantitative analysis of chemical components, characterization of materials, prediction of continuous biomarker variables in medical diagnosis, and quality control in manufacturing [37–39]. In these regression tasks, GAN can be used to generate synthetic spectra with corresponding labels, which are missing labels in the training dataset. Nevertheless, it is always difficult to generate a high-quality synthetic spectrum with a given missing label, so few studies have been conducted thus far.

Recently, a continuous conditional GAN (CcGAN) model has been proposed to generate diverse, high-quality image samples from the image distribution conditional on a given regression label [40–43]. In this paper, the CcGAN model was proposed to simulate the synthetic spectra for small sample sets and perform data augmentation for regression. The chosen small sample set comprises multimode spectral data obtained from a self-interference microring resonator operating at varying temperatures [44]. The multimode spectral data are utilized by the machine learning model to make temperature predictions. The synthetic spectra are applied to evaluate the impact of data augmentation on the performance of the regression model.

2. Method and algorithm

In this section, the basic theory of CcGAN is firstly introduced. Then, the specific method for the generation of spectral samples by CcGAN is presented in detail. In addition, a specific method of applying CcGAN to a regression task is described at the end of this section.

2.1 Generate spectral samples with continuous conditional labels using CcGAN

Conditional GANs (cGANs) are adopted to generate samples that are conditioned on some input variable or label, allowing for more control over the generated output [45]. In cGANs, the generator receives both random noise and additional conditional input, often represented as label or class information. This conditional input guides the generator to produce samples conditioned on the provided information. The discriminator in a cGAN is also conditioned on the same additional input and learns to differentiate between real samples and generated samples based on both their quality and adherence to the provided condition. Although cGAN incorporates additional conditioning information into the training process, the information is always some discrete output variables. However, the cGAN model is unsuitable for generating synthetic samples for continuous conditional variables. On the one hand, the label embedding in cGAN is constructed based on the known label classification, but the number of label classifications is infinite in continuous scenarios. On the other hand, the estimation of the loss functions is expressed by the Dirac function, which is not a good estimate in continuous scenarios [40].

CcGANs extend the conditioning to continuous variables, allowing for more fine-grained control over the generated synthetic spectra. The conditioning information is typically in the form of a continuous vector. As shown in Fig. 1(a), the architecture of the CcGAN for continuous labels mainly consists of four parts: a generator network, a discriminator network, a pre-trained network and a label embedding network. An improved label input mechanism has been proposed to better incorporate continuous regression labels into CcGAN in the continuous scenario [40]. The label input approach is performed by a pre-trained network and a label embedding network. Here, CcGAN is carefully designed to generate synthetic spectra under continuous labels. The following sections provide a detailed explanation of its implementation.

Fig. 1. The proposed spectral generation method under continuous labels based on a CcGAN. (a) A typical workflow of the proposed CcGAN framework for spectral generation. The residual block structure of (b) the generator and (c) the discriminator. (d) The pre-trained network on the left and the label embedding network on the right.

Download Full Size | PDF

The generator network is composed of one linear layer and four gen-blocks (short for functional blocks in the generator network). As shown in Fig. 1(b), the genblock incorporates a distinct residual structure. It incorporates one skip connection into its main route. The main route consists of two cascaded sub-blocks. These sub-blocks are numbered as “Sub-block n” (n = 1, 2) in the genblock or disblock (short for functional block in the discriminator network). Each sub-block consists of some operations, such as conditional batch normalization (condBN), followed by a nonlinear activation function, such as a rectified linear unit (ReLU), and a layer of neural network operations, such as a convolution layer. In Sub-block 1, an upsample layer is typically used in conjunction with a convolution layer so that the generator is able to learn increasingly complex representations of the spectral data. The skip connection allows the input to bypass an upsample layer and a convolution layer and directly add to the output of the block. With the bypass connection, the gradient can flow directly to the earlier layers, which helps to prevent the vanishing gradient. The input to the generator network is a random noise vector. Each element of the vector follows a standard normal distribution.

The discriminator network is composed of two linear layers and five dis-blocks. As shown in Fig. 1(c), a residual dis-block consists of one main route and one skip connection. The main route represents the primary flow of information within two cascaded sub-blocks. The skip connection allows the input of the block to be directly added to the output of the main route. In both its main route and skip connection, a spectral norm (SN) layer is typically applied to prevent gradient explosion or vanishing during training and stabilize the training of GANs [46]. For the residual dis-block except the first sub-block, an average pooling layer is used after one or more convolution layers to progressively reduce the spatial dimensions of the feature map at the end of both the main route and the skip connection. The discriminator takes both synthetic and real spectral images as input. The former are generated by the generator network, and the latter are obtained by transforming one-dimensional real spectral data into two-dimensional spectral images.

In Fig. 1(d) on the left, the pretrained network is partitioned into two sub-networks, namely, T₁ and T₂. The subnetwork T₁ is built to learn a mapping from an input spectral sample x to an output high-dimensional feature h through a process of training. The subnetwork T₂ builds the relationship between the extracted feature h and a regression label y during a process of training. The pretrained network adopts a fully connected neural network with an input layer, two hidden layers and an output layer. The pretrained network takes one-dimensional real spectral data as its input variable and its corresponding label as the output variable. The two hidden layers each contain the same number of neurons, matching the dimensionality of the high-dimensional feature h. The training set consists of many one-dimensional real spectral data (x) paired with their corresponding labels (y). The training set is used to train the pretrained network. After the training process, the sub-network T₂ can map the high-dimensional feature (h) to a regression label y.

Aided by this trained T₂ sub-network, the label embedding network (T₃) can map a regression label y to a high-dimensional feature (h) in Fig. 1(a). In Fig. 1(d) on the right, the label embedding network T₃ is depicted as a fully connected neural network with three layers, comprising one input layer, one hidden layer and one output layer. The number of neurons in the input layer matches the dimensionality of the labels. Both the hidden layers and the output layer of the label embedding network have the same number of neurons, which is equivalent to the dimensionality of the high-dimensional features (h). During the neural network training process, a label y is chosen as the input for the T₃ network, and the resulting output is then fed into the trained T₂ network as input. Following the mapping, the estimated output label $\hat{y}$ from the T₂ sub-network is compared to the input label y of the T₃ network, and the loss is computed to optimize the T₃ network. Subsequently, the T₃ network undergoes iterative updates to its model parameters to minimize the loss. After training, the trained T3 network can be used to map an arbitrary label to a high-dimensional feature h. Then, the feature h is input into the generator and discriminator networks by conditional batch normalization (condBN) and label projection, respectively. Small spectral datasets are adopted in this paper. Compared to the original CcGAN used for image generation under continuous labels [45], T₁ and T₃ have a small size to avoid overfitting problems.

As previously mentioned, in cGANs, the loss functions in cGANs may provide an imprecise estimate for regression labels in the continuous scenario, especially when each regression label has only one spectral sample corresponding to it in various fields of spectral sensing [44]. To calculate the discriminator loss and generator loss, the key is to estimate the mixed density p(x,y) of the joint distribution of x and y [24]. Because there is a small spectral sample size for each distinct condition, the expression of p(x,y) in cGAN is not a good estimate in the continuous scenario. Then, two novel estimates for p(x,y) were proposed in CcGAN, termed the hard vicinal estimate (HVE) and soft vicinal estimate (SVE) [40]. The HVE uses the sample to estimate p(x|y) only when the distance of its label from y is less than κ, where p(x|y) is the conditional probability density function of the sample distributions conditional on y, and κ is a positive hyperparameter. Instead of using samples in a HVE, the SVE uses all respective samples to estimate p(x|y) by assigning a weight to each sample, and the weight of a sample varies inversely as the distance of its label from y. Obviously, the estimation accuracy in the SVE is higher than that in the HVE. Then, the SVE is adopted in the proposed CcGAN-based model to accurately estimate the spectral sample distribution conditional on the label that does not appear in the training set. The mixed probability p(x,y) can be effectively estimated by utilizing the conditional probability p(x|y) obtained through the SVE technique, resulting in an accurate estimation. Consequently, a soft vicinal discriminator loss (SVDL) is derived as follows [40]:

(1)$$\begin{aligned} {{\mathrm{\hat{{\cal L}}}}^{SVDL}}(D) & =- \frac{1}{{{N^r}}}\sum\limits_{j = 1}^{{N^r}} {\sum\limits_{i = 1}^{{N^r}} {{\mathbb{E}_{{\mathrm{\varepsilon }^r}\sim \mathrm{{\cal N}}\textrm{(0,}{\mathrm{\sigma }^2}\textrm{)}}}[{{e^{ - {{[{y_i^r - ({y_j^r + {\mathrm{\varepsilon }^r}} )} ]}^2}/{\mathrm{\kappa }^2}}}\log ({D({x_i^r,y_j^r + {\mathrm{\varepsilon }^r}} )} )} ]} } \\ &- \frac{1}{{{N^g}}}\sum\limits_{j = 1}^{{N^g}} {\sum\limits_{i = 1}^{{N^g}} {{\mathbb{E}_{{\mathrm{\varepsilon }^g}\sim \mathrm{{\cal N}}\textrm{(0,}{\mathrm{\sigma }^2}\textrm{)}}}} } [{{e^{ - {{[{y_i^g - ({y_j^g + {\mathrm{\varepsilon }^g}} )} ]}^2}/{\mathrm{\kappa }^2}}}\log ({1 - D({x_i^g,y_j^g + {\mathrm{\varepsilon }^g}} )} )} ]\end{aligned}$$

where ${N^r}$ and ${N^g}$ are the numbers of real and fake samples, respectively, ${\mathrm{\varepsilon }^r}$ and ${\mathrm{\varepsilon }^g}$ represent two corresponding noise variables following a normal distribution with a mean of 0 and standard deviation of σ, $x_i^r$ and $x_i^g$ are the i-th real and fake spectral samples, respectively, and $y_i^r$ and $y_i^g$ are the labels of $x_i^r$ and $x_i^g$, respectively. Here, the input target label y satisfies $y \buildrel \Delta \over = y_j^r + {\mathrm{\varepsilon }^r}$ and $y \buildrel \Delta \over = y_j^g + {\mathrm{\varepsilon }^g}$ for real and fake spectral samples, respectively. This means that the label y provided to the generator and the discriminator has a certain deviation from $y_i^r$ and $y_i^g$. The term ${e^{ - {{[{y_i^r - ({y_j^r + {\mathrm{\varepsilon }^r}} )} ]}^2}/{\mathrm{\kappa }^2}}}$ or ${e^{ - {{[{y_i^g - ({y_j^g + {\mathrm{\varepsilon }^g}} )} ]}^2}/{\mathrm{\kappa }^2}}}$ represents the weight assigned to the real or fake spectral samples in the SVE. The closer the label $y_i^r$ ($y_i^g$) corresponding to this real or fake sample is to the input target label y, the greater the assigned weight. The positive hyperparameter $\mathrm{\sigma }$ satisfies $\mathrm{\sigma } = {(4\mathrm{\hat{\sigma }}_{{y^r}}^5/3{N^r})^{1/5}}$, where $\mathrm{\hat{\sigma }}_{{y^r}}^{}$ is the sample standard deviation of normalized labels in the training set. The positive hyperparameter $\mathrm{\kappa }$ satisfies $\mathrm{\kappa } = \max (y_{[2]}^r - y_{[1]}^r,y_{[3]}^r - y_{[2]}^r, \cdot{\cdot} \cdot ,y_{[N_{uy}^r]}^r - y_{[N_{uy}^r - 1]}^r)$, where $y_{[l]}^r$ is the l-th smallest normalized distinct real label and $N_{uy}^r$ is the number of normalized distinct labels in the training set. The conditional distribution used to calculate the loss can be estimated by assigning a different weight to all training samples. The model still has sufficient samples to make accurate estimates of the conditional distribution in cases of insufficient small training data.

The novel CcGAN generator loss is derived as follows:

(2)$$\hat{{\cal L}}(G) ={-} \frac{1}{{{N^g}}}\sum\limits_{i = 1}^{{N^g}} {{\mathrm{\mathbb{E}}_{{\mathrm{\varepsilon }^g}\sim {\cal N}(0,{\mathrm{\sigma }^2})}}\log ({D({G({{z_i},y_i^g + {\mathrm{\varepsilon }^g}} ),y_i^g + {\mathrm{\varepsilon }^g}} )} )} $$

where z_i is a random noise following a standard normal distribution. Gaussian noise is added to existing labels to train the generator to generate samples on unknown labels.

2.2 Applying synthetic spectra generated by CcGAN in the regression model

Figure 2(a) shows the flow chart of the CcGAN-based data augmentation in a regression model for spectral sensing. Once the CcGAN generates synthetic spectra with continuous conditional labels, these generated samples can be merged with the original training dataset, thereby enhancing its size and diversity. By expanding the training dataset with synthetic spectral samples, the data augmentation method can help improve the performance and generalization of machine learning models. However, it is important to note that the quality and effectiveness of CcGAN-generated data depend on the training procedure, CcGAN architecture, and quality of the original dataset.

Fig. 2. Apply synthetic spectra generated by CcGAN in the regression model. (a) Flow chart of CcGAN-based data augmentation in a regression model for spectral sensing. (b) Illustration of the leave-one-out method when expanding the dataset with synthetic spectral samples.

Download Full Size | PDF

In the CcGAN, all the labels are always normalized to ensure consistency and comparability between different data samples. Then, the input continuous label y should have a value between 0 and 1. For convenience, we assign an input label value as $m/(M - 1)$, where M is the total number of synthetic spectra and m is an integer (m = 0, 1, …, M-1.). The leave-one-out method maximizes the use of the available data for both training and testing. Due to the limited size of the original spectral dataset, the paper employs this method to assess the performance of the machine learning model, which involves evaluating metrics such as the mean squared error (MSE). It is a special case of k-fold cross-validation, where the number of folds is equal to the number of spectral samples in the original dataset. Figure 2(b) shows an illustration of the leave-one-out method. The total dataset consists of real and synthetic data. The real data are partitioned into N observations, where N is the total number of observations in the original dataset. Each observation consists of a single spectral sample, and the model is trained on the remaining N-1 real samples from the original dataset along with M synthetic samples generated by the CcGAN model. The trained model is then used to predict the outcome for the test sample that was left out. This process is repeated N times, each time leaving out a different test sample.

3. Experiment

To discuss the feasibility of CcGAN applied in spectral sensing for a regression task, a small dataset is used in this work. The dataset consists of a set of ergodic spectra paired with their corresponding temperature labels. These data are collected by a temperature measurement system based on a self-interference microring resonator (SIMRR). Benefitting from the special intensity sensing mechanism, the SIMRR allows multimode sensing empowered by machine learning in a wide range of wavelengths [47–49]. Multimode sensors can also provide more comprehensive information about the sample being measured by simultaneously monitoring multiple parameters or analytes. Machine learning algorithms are applied to fuse multimode sensing information to offer enhanced accuracy, selectivity, and discrimination capabilities [44,50–54].

3.1 Experimental setup

The experimental setup for SIMRR-based ergodic spectra sensing has been reported in detail in our previous works [44], so only a brief description is given here. The SIMRR-based sensor configuration involves a microring resonator that is coupled by a sensing arm waveguide two times. Instead of relying on frequency shifts as the sensing mechanism, the change in extinction ratio is utilized for dissipative sensing. An SIMRR sensor was fabricated based on a Si₃N₄ wafer. In the structure, the microring resonator has a radius R = 100 µm, and the sensing arm waveguide has an initial length L = 250 µm. A metal thin-film microheater was installed on the surface of the sensing waveguide arm waveguide. The temperature of the sensing arm waveguide goes up when loading the voltage on the microheater. Then, an additional phase is generated by the sensing arm waveguide due to thermo-optic effects, which modify the interaction between the light field in the microring resonator and the sensing arm waveguide. This further leads to different extinction changes in multiple modes.

Figure 2(a) illustrates the schematic of the experimental setup. A broad-band incoherent light source (over a 120 nm bandwidth) is used to excite the measurement system. Then, a fiber polarization controller (FPC) followed after the light source is used to control and manipulate the polarization state of light traveling through an optical fiber. Two tapered fibers are used separately to couple light into and out of the SIMRR-based sensor chip. Finally, an optical spectrum analyzer (OSA, 0.02 nm resolution) is connected to capture the spectral response of the SIMRR-based sensor.

3.2 Sample preparation

An applied voltage is loaded on the microheater to modulate the transmittance of the SIMRR-based sensor. The voltage is generated by a direct current (DC) source and can be tuned within the range of 0 to 3.8 V with a precision of 0.1 V. Due to the limited resolution of the DC source, spectral data were acquired for 39 distinct voltages, resulting in a total of 39 sets of data. It is difficult to implement a method for the direct calibration of the temperature of the buried sensing arm waveguide. Therefore, the applied voltage is adopted as a detected target (label). The small dataset was taken as an example to explain multimode sensing aided by machine learning [44]. Figure 3(b) shows a typically measured transmission spectrum at the applied voltage U = 1.1 V. There are many transmission dips present in the spectrum, and each transmission dip represents a resonant mode inside the SIMRR. The collected spectra show that each transmission dip has a different response to the change in the applied voltage, including its transmission extinction and resonant wavelength. SIMRR-based ergodic spectra sensing can also be applied in various fields, including environmental monitoring, biomedical sensing, chemical analysis, and gas sensing. In these application scenarios, some continuous target parameters are precalibrated as labels for a regression task. Their involved data augmentation techniques are exactly the same as that for this small dataset by the experimental setup in Fig. 3(a). Hence, a study on ergodic spectra sensing and data augmentation techniques is conducted using this limited set of 39 spectral data points in the article.

Fig. 3. SIMRR-based ergodic spectra sensing aided by machine learning. (a) Schematic of the experimental setup: SIMRR chip: self-interference microring resonator chip; FPC: fiber polarization controller; OSA: optical spectrum analyzer. (b) A typically measured transmission spectrum at the applied voltage U = 1.1 V(c) Workflow schematic employing a deep neural network (DNN) for temperature prediction.

Download Full Size | PDF

In many practical measurement scenarios, collecting a large amount of training data before the actual measurement can be challenging, costly, and time-consuming. However, when performing the measurement in real time, only a small amount of new test data is available. In such cases, transfer learning becomes essential to achieve high prediction accuracy for the new test data. Although the generalized regression neural network (GRNN) has achieved an excellent effect in merging multimode sensing information, this algorithm does not support the possibility of accomplishing transfer learning [55–57]. DNN can be used effectively for regression tasks, leveraging its ability to learn complex patterns and relationships within spectral data. One remarkable advantage of using this network is that the model built by it has a certain transfer learning ability in real measurement scenarios. This is beyond the scope of this paper and could be considered in our future work. In this paper, we focus on data augmentation using CcGANs for a DNN-based regression task.

Figure 3(c) shows the workflow schematic employing DNN for voltage (temperature) prediction for SIMRR-based ergodic spectra sensing. The input variables represent intensity values in the spectra, and the output variables are either temperature (voltage) labels during training or temperature (voltage) prediction values during testing. Due to a small training set, it is difficult to build an accurate DNN-based regression model. Therefore, we effectively expand the dataset and increase its diversity by augmenting the small training set with synthetic spectral data generated by the CcGAN. The experimental spectra belong to one-dimensional data with 20480 sample points, but the input of the discriminator is a 1 × 64 × 64 image sample. To match the data format of a 1 × 64 × 64 image sample, each one-dimensional spectrum is divided into five parts, and each part consists of 4096 sample points (in Fig. 3(b)). Then, each part is transformed from a 1 × 4096 vector into a 1 × 64 × 64 image sample. Consequently, the original spectral dataset can be separated into five sub-datasets, and each sub-dataset comprises 39 sets of spectral portions paired with their corresponding normalized voltages. Based on each sub-dataset and some given input target labels y, the CcGAN is trained to generate a part of the synthetic spectrum. By combining five partial synthetic spectra with the same label, a complete synthetic spectrum can ultimately be generated.

4. Results and discussion

4.1 Training CcGAN

As previously mentioned, the original real dataset is divided into five sub-datasets, each of which is used for training a CcGAN and generating a part of the synthetic spectrum. Let us take one of the sub-datasets as an example to describe the CcGAN’s training and generation process. In Table 1, the training parameters for the pre-trained network, the T3 network, and 355 the CcGAN are provided. Referring to the left side of Fig. 1(d), the pre-trained network adopts a fully connected neural network with an input layer, an output layer and two hidden layers. It takes a normalized spectral sample as its input and produces its corresponding normalized label as the output variable. The number of neurons in an input layer is set to 4096. The number of neurons in two hidden layers is set to 128, which equals the dimension of the high-dimensional feature h. First, we trained the pre-trained network based on a 39 training sub-dataset, using the Adam optimizer with a learning rate of 0.001. Achieving a training loss of $9 \times {10^{ - 6}}$ after 2,000 epochs was a significant reduction and indicates that the pre-trained network had been trained well. Referring to the right side of Fig. 1(d), the label embedding network T₃ is a fully connected neural network consisting of two hidden layers and one output layer. The input layer has one output node, matching the dimension of the label (temperature (voltage)). Both the hidden and output layers consist of 128 nodes, matching the dimension of high-dimensional features (h). The T₃ network adopts stochastic gradient descent (SGD) with a learning rate of 0.0001 as an optimizer, ReLU as an activation function, and GroupNorm as a normalization technique. After 500 epochs, the training loss was reduced to $3 \times {10^{ - 5}}$, and the training process was stopped. As stated in Section 2, utilizing the trained T₂ sub-network, the label embedding network (T₃) can effectively transform a regression label y into a high-dimensional feature (h).

Table 1. CcGAN hyperparameters

View Table | View all tables in this article

Once the training of the label embedding network is finished, the training of the generator and discriminator commences. First, we assign a fake label ($y_i^g$) to a fake sample ($x_i^g$) in each iteration. To create a target label, a noise variable is added to each label in the original dataset. The noise variable is generated from a normal distribution with a mean of 0 and a standard deviation of σ. Once the target label is established, each target label is paired with a real sample whose original real label is chosen randomly from a narrow range centered around the target label. During the training process, the utilized dataset comprises pairs of real samples and their corresponding target labels, rather than the original real labels. The corresponding fake label is assigned a value using the following method. The upper and lower bounds of a fake label value are determined by adding and subtracting noise from its corresponding target label, respectively. The fake label is randomly selected from a uniform distribution within the range defined by the upper and lower bounds. Second, once each target label is paired with a corresponding real sample and fake label, the training of the generator and discriminator can commence. By taking 39 random noise vectors (z) of size 1 × 256 and the corresponding fake labels as input, the generator generates 39 synthetic samples. The discriminator loss is computed by combining the paired real samples with their corresponding target labels and the paired synthetic samples with their respective fake target labels. We can calculate the standard deviation ${\mathrm{\hat{\sigma }}_{{y^r}}}$ of 39 normalized real labels, and the standard deviation of the noise variable ${\mathrm{\varepsilon }^r}$ (${\mathrm{\varepsilon }^g}$) satisfies $\sigma \textrm{ = }0.1509$. In the experiment, since the interval between two consecutive training labels is equal, the defined hyperparameter $\kappa$ is this interval ($\kappa = 0.0263$). The generator loss is computed using the feedback provided by the discriminator. Subsequently, the discriminator and generator are updated iteratively to improve their performance. During the training process, the generator learns to generate spectral samples that align with the target label, while the discriminator learns to distinguish between real data spectral samples and generated spectral samples conditioned on the target label. Both the generator and discriminator were optimized with the Adam optimizer with a learning rate of 1 × 10⁻⁴, and these hyperparameters followed the adaptive algorithm.

Finally, the quality of the generated synthetic samples is evaluated to optimize the hyperparameters in the training process. 39 real labels are input into the trained CcGAN, and the generated spectral samples are compared with the corresponding real spectra. In Table 2, the structural similarity (SSIM) between the generated spectra and their corresponding real spectra is presented for various combinations of batch size and iteration. With the batch size set to 39 and the number of iterations set to 15000, the SSIM index of 0.982 signifies a nearly perfect match between the spectra. Based on these results, we can conclude that the CcGAN has been effectively trained in this case.

Table 2. Calculated structural similarity (SSIM) under different combinations of batch size and iteration

View Table | View all tables in this article

4.2 Synthetic spectra generated by CcGAN

Once the training of the CcGAN is completed, both the CcGAN model and the label embedding network T₃ are saved for future use. Let us give an example to illustrate the generation of synthetic spectra conditioned on continuous conditional labels. For example, to generate 500 sets of synthetic spectra, 500 continuous labels are input into the trained label embedding network T₃. These labels are assigned values according to the formula m/(M-1), where M is set to 500 and m ranges from 0 to 499. Furthermore, 500 sets of 256-dimensional random noise vectors, denoted as z, are fed into the trained generator. Consequently, 500 sets of synthetic spectra corresponding to the P1 section (in Fig. 3(b)) are generated. Using the same methodology and the aforementioned given labels, the synthetic spectra for other sections (P2-P5) can be generated accordingly. By combining the five parts of the generated spectra in sequential order under the same label, a total of 500 complete synthetic spectra can be obtained.

In this case, the data augmentation techniques employed for the regression task differ from those utilized for image classification. The latter focuses on creating variations of the input images to increase the size and diversity of the training dataset. However, in regard to the former, the objective is to accurately predict a continuous voltage (temperature) value. In this context, data augmentation for regression tasks focuses on generating synthetic samples that closely resemble the original small dataset. It is desired for the CcGAN to generate realistic synthetic spectra conditioned on continuous labels, which do not appear in the original dataset. Because the proposed CcGAN can accurately estimate the spectrum distribution conditional on such missing labels, the objective has been successfully achieved. Figure 4 shows the quality assessment of the synthetic spectra generated by the CcGAN. We assigned a voltage value of 1.14 V (its normalized label is 150/499) as an input label, which was not originally present in the real dataset. Subsequently, the synthetic spectra generated by CcGAN, corresponding to the assigned voltage values, are compared with the real experimental spectra that have labels close to the given voltage value. Figure 4(a) and (b) display the experimental real spectra at U = 1.1 V and 1.2 V, respectively. Figure 4(c) displays the synthetic spectrum at U = 1.14 V generated by the CcGAN. The comparison reveals a close resemblance between the synthetic spectrum and two experimental real spectra in terms of the total number, wavelength, line width, and extinction ratio of the resonant modes. Figure 4(d) shows the detailed comparison among the three spectra at the transmission dip of approximately 1597.05 nm. In the synthetic spectra, the resonance wavelength and extinction of the dip are found to lie within the range of the resonance wavelength and extinction observed in the two experimental real spectra. This observation aligns precisely with the sensing principle of SIMRR, confirming that the synthetic spectrum closely approximates its true spectrum in terms of intricate details. This finding suggests that the CcGAN model has effectively captured the inherent patterns and characteristics present in real spectral data.

Fig. 4. Quality assessment of the synthetic spectra generated by the CcGAN. (a) A real experimental spectrum at (a) U = 1.1 V and (b) U = 1.2 V(c) A synthetic spectrum at U = 1.14 V generated by the CcGAN. (d) A comparison between the real experimental spectra (U = 1.1 V, 1.2 V) and the synthetic spectrum (U = 1.14 V) generated by the CcGAN at the transmission dip of approximately 1597.05 nm. (e) PCA clustering results for 39 sets of experimental real spectra and 500 sets of synthetic spectra generated by the CcGAN. (f) PCA clustering results for 39 sets of experimental real spectra and 500 sets of synthetic spectra generated by the BEGAN at $\gamma \textrm{ = }6.5$.

Download Full Size | PDF

Furthermore, principal component analysis (PCA) is employed to provide a comprehensive evaluation of the similarity between the synthetic spectra and the experimental real spectra [29,30]. Figure 4(e) illustrates the scores of the first two principal components (PCs) for a dataset consisting of 500 synthetic spectra and 39 experimental real spectra. The synthetic spectra exhibited remarkable similarity to the experimental real spectra within the voltage range of 0-3.8 V, making them indistinguishable from one another. Additionally, within the voltage range of 0-3.8 V, the distribution between the data points displayed a relatively uniform pattern, with no notable presence of significant outliers. For comparison, a BEGAN is also applied to generate 500 sets of synthetic spectra based on the original training dataset. BEGAN provides an alternative approach to address the instability issues commonly observed in traditional GAN training [58]. Nevertheless, the generative labels in CcGAN are typically uncontrolled and randomly generated during the training process. Figure 4(f) displays two-dimensional PCA plots for 39 sets of experimental real spectra and 500 sets of synthetic spectra generated by BEGAN. While the synthetic spectrum demonstrates high quality, it is worth noting that the label range of the synthetic spectrum is concentrated within a narrower range. Here, the hyperparameter $\gamma$ is optimized as 6.5 to achieve a balance between the quality and diversity of the generated spectra, and the related details are provided in Supplement 1. Compared with the BEGAN model, the CcGAN model has better control granularity for generative labels. CcGAN is capable of generating high-quality spectral samples that align with the requirements of regression tasks in continuous scenarios.

4.3 Performance of data augmentation on regression models

As shown in Fig. 2(a), the data augmentation method expands the training dataset for the DNN-based model by incorporating synthetic spectral samples. The DNN network structure and hyperparameters undergo extensive optimization calculations based on the original training dataset to achieve the minimum mean squared error (MSE) for the testing dataset. Numerous numerical calculations have provided confirmation that the DNN does indeed adopt such a four-layer network structure. The input layer consists of 20,480 nodes, matching the length of each experimental real spectrum. There are two hidden layers, each comprising 128 nodes. The output layer has a single node representing the prediction result (voltage). Table 3 displays the hyperparameters with their range values that were investigated in this article. After each hidden layer, a batch normalization (BN) layer is applied, followed by a softmax activation function layer. The training process commences by employing the Adam optimizer with a learning rate ranging from 10⁻² to 10⁻⁴. The model undergoes training for a variable number of epochs ranging from 4000 to 6000 while employing a batch size that ranges from 16 to 32. The DNN model is evaluated using the leave-one-out method, wherein the training dataset is expanded with varying numbers of synthetic spectral samples. As shown in Fig. 2(b), the model is trained and evaluated iteratively, with each iteration leaving out one experimental real spectrum as the test sample and using the rest for training. The measurement process is repeated 20 times for each test spectral sample, and we record the squared error between the prediction result (voltage) and its actual voltage value at each time. By averaging these prediction results obtained from multiple repetitions of the measurement process, the influence of random fluctuations or outliers can be mitigated. The MSE is first calculated by averaging the results from multiple measurements of a single test sample and further averaging over the results from multiple test samples. By incorporating data augmentation techniques, the performance of the DNN model was assessed using the MSE metric.

Table 3. DNN hyperparameters with their range values

View Table | View all tables in this article

The loss function is utilized to quantify the level of error between the predicted value and the actual value. This plays a crucial role in determining the generalization capability of the DNN model. Figure 5(a) depicts the graphical representation of the loss function during the training and validation phases. As the number of iterations increases, the loss value becomes progressively smaller and tends to be stable. As the number of epochs increases, the loss value gradually decreases and eventually stabilizes. When the number of epochs increases to 3000, the validation loss value is already close to $5 \times {10^{ - 4}}$. From the curves, we can see that when the number of epochs is sufficient, the predicted voltage values are almost the same as the actual values, which indicates that the DNN model has high accuracy.

Fig. 5. Performance of data augmentation on DNN-based regression models. (a) Loss function curves. (b) Effect of increasing synthetic spectral samples on MSE. (c) Under several epochs and batch sizes, the MSEs using the leave-one-out method when generating 2000 sets of synthetic spectra. (d) Scatter plot and MSE versus test index without the CcGAN augmentation phase. (e) Scatter plot and MSE versus test index with the CcGAN augmentation phase (2000 sets of synthetic spectra). (f) Histogram of MSE.

Download Full Size | PDF

In these experiments, the proposed DNN model was enhanced by incorporating a variable number of synthetic spectral samples. Figure 5(b) displays the box-line plot of the predicted MSE by the DNN model as the number of augmented synthetic spectra increases. Each box contains 39 MSEs, which are obtained by averaging the results from 20 measurements of a single test sample. Most MSE values fall into the box. Based on the outcomes of these experiments, the minimum MSE was achieved when utilizing 2000 synthetic samples. A thorough explanation is provided for the box-line plot. The box has a height of 3.6 × 10^−3, representing the range of values. The middle number, or median, is represented by a horizontal line drawn in the center, and it equals 1.36 × 10⁻³, indicating the midpoint of these values. Compared with the box without the CcGAN augmentation phase, there was a 54.66% and 71.6% decrease in its range and median, respectively, indicating a significant improvement in the model’s prediction accuracy. After data augmentation, there was a significant decrease in the magnitude and number of outliers. The performance of MSE is severely affected by both the magnitude and quantity of outliers, having a significant detrimental impact. The alterations in these outliers play a crucial role in enhancing the overall prediction accuracy of the model. In addition, data augmentation significantly enhances the prediction error at nonoutlier test points, as evident from the length and position of the boxes. As the number of synthetic spectra exceeds 500, the prediction accuracy generally remains stable, although there might be a slight decline in performance. The reason for this is the presence of varying deviations between the generated synthetic spectra from CcGAN and the real spectra. With an increase in the number of synthetic spectra, there is a possibility that the deviation may also grow larger. Taking into account both the generation cost and prediction results, the optimal enhancement effect is achieved with 2000 synthetic spectra.

In Fig. 5(b), we optimized the hyperparameters of the DNN to achieve optimal prediction results for different training data scenarios. We took 2000 sets of synthetic spectral data as an example to illustrate the optimization. We conducted a scan of epoch and batch size values to obtain the minimum MSE, where we configured the epoch range from 4000 to 6000 with a step size of 2000 and the batch size range from 16 to 32 with a step size of 8. Figure 5(c) shows the calculated MSEs under several epochs and batch sizes. These results show that the minimum MSE of 3.51 × 10⁻³ occurs at epoch = 5000 and batch size = 32. In this scenario, the voltage applied across the microheater was adjusted in increments of 0.1 V, ranging from 0.0 to 3.8 V. These original samples were assigned test indices ranging from 1 to 39. Figure 5(d) displays the scatter plot and MSE versus test index without the augmentation phase, while Fig. 5(e) exhibits the scatter plot and MSE versus test index with the augmentation phase by adding 2000 sets of synthetic spectra into the training dataset. In comparison to the MSE without the augmentation phase, the MSE with the augmentation phase exhibits a 54.58% reduction. Figure 5(f) depicts the histogram of the MSEs for all test indices. The histogram visually illustrates the MSE distribution, revealing the count of observations within each bin. In comparison to the case without the augmentation phase, there was a significant reduction in the number of test indices with MSE falling within the large ranges, while there was a noticeable increase in the number of test indices with MSE falling within the low ranges. This further substantiates the effectiveness of the data augmentation technique in improving the accuracy and precision of the DNN model’s predictions.

By running the DNN algorithm on a computer equipped with an NVIDIA GeForce GTX1080 Ti GPU, an Intel Core i7 3.7 GHz CPU and 16 GB memory, the average operation time was 0.054 seconds. As a result, our approach holds great potential for regression sensing applications with high accuracy. The aforementioned results unequivocally validate that the proposed CcGAN has the capability to generate high-quality spectral data with predetermined labels in continuous scenarios. Furthermore, when combined with DNN networks, it achieved a superior data augmentation effect. This outcome holds significant importance not only for microcavity-based multimode sensing but also for other spectral sensing techniques utilized in regression tasks.

5. Conclusion

In this article, it is proven that the proposed CcGAN is capable of generating high-quality spectral data with predetermined labels in continuous scenarios. By employing an improved label input mechanism and leveraging the SVE technique, the CcGAN significantly broadens the range of conditioning to encompass continuous variables, resulting in a more precise estimation of the joint density. During the training process of the proposed CcGAN, the one-dimensional real spectra undergo a transformation into two-dimensional images, subsequently serving as input for the discriminator. Within the range of all real spectral labels, arbitrary predetermined labels can be input into the trained CcGAN to generate corresponding synthetic spectral samples. The proposed CcGAN was applied in SIMRR-based multimode spectral sensing for a regression task. Compared with the BEGAN model, the proposed CcGAN greatly improves the efficiency and quality of generating synthetic spectra. By adding a sufficient number of synthetic spectra by the CcGAN into the training dataset, the prediction accuracy of the DNN regression model can be significantly improved by 54.58%. The proposed CcGAN effectively demonstrates its exceptional data augmentation capabilities in regression tasks involving optical microcavity-based multimode sensing. Notably, this method holds significant implications for enhancing other spectral sensing techniques employed in regression tasks as well.

Funding

Horizontal projects of public institution (KY-H-20221007, KYY-HX-20210893); Open Fund of the State Key Laboratory of Advanced Optical Communication Systems and Networks, China (2020GZKF013); Natural Science Foundation of Zhejiang Province (LY16F050009, LY20F050009); National Natural Science Foundation of China (60907032, 61675183, 61675184).

Acknowledgment

This work was partially carried out at the USTC Center for Micro and Nanoscale Research and Fabrication. The authors thank Dr. X. Ding (Department of Artificial Intelligence at Nanjing University of Information Science & Technology, China) for the fruitful discussions.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. P. Stoica and R L. Moses, Spectral analysis of signals (Pearson Prentice Hall, 2005).

2. P. Li and J. Hur, “Utilization of UV-Vis spectroscopy and related data analyses for dissolved organic matter (DOM) studies: A review,” Crit. Rev. Environ. Sci. Technol. 47(3), 131–154 (2017). [CrossRef]

3. A. Tinti, V. Tugnoli, S. Bonora, et al., “Recent applications of vibrational mid-Infrared (IR) spectroscopy for studying soil components: a review,” Journal of Central European Agriculture. 16(1), 1535 (2015). [CrossRef]

4. N. Colthup, Introduction to infrared and Raman spectroscopy (Elsevier, 2012).

5. L. Xie, S. Luo, Y. Liu, et al., “Automatic identification of individual nanoplastics by Raman spectroscopy based on machine learning,” Environmental Science & Technology, to be published, (2023).

6. B. Domon and R. Aebersold, “Mass spectrometry and protein analysis,” Science 312(5771), 212–217 (2006). [CrossRef]

7. T. Staudacher, F. Shi, S. Pezzagna, et al., “Nuclear magnetic resonance spectroscopy on a (5-nanometer) 3 sample volume,” Science 339(6119), 561–563 (2013). [CrossRef]

8. X. Zhang, T. Lin, J. Xu, et al., “DeepSpectra: An end-to-end deep learning approach for quantitative spectral analysis,” Anal. Chim. Acta. 1058, 48–57 (2019). [CrossRef]

9. X. Yu, J. Wang, S. Wen, et al., “A deep learning based feature extraction method on hyperspectral images for nondestructive prediction of TVB-N content in Pacific white shrimp (Litopenaeus vannamei),” Biosyst. Eng. 178, 244–255 (2019). [CrossRef]

10. Y. Chen and Z. Wu, “Quantitative analysis modeling of infrared spectroscopy based on ensemble convolutional neural networks,” Chemom. Intell. Lab. Syst. 181, 1–10 (2018). [CrossRef]

11. X. Yu, H. Lu, and D. Wu, “Development of deep learning method for predicting firmness and soluble solid content of postharvest Korla fragrant pear using Vis/NIR hyperspectral reflectance imaging,” Postharvest Biol. Technol. 141, 39–49 (2018). [CrossRef]

12. S. Krauß, R. Roy, H. Yosef, et al., “Hierarchical deep convolutional neural networks combine spectral and spatial information for highly accurate Raman-microscopy-based cytopathology,” J. Biophotonics. 11(10), e201800022 (2018). [CrossRef]

13. J. Yang, J. Xu, X. Zhang, et al., “Deep learning for vibrational spectral analysis: Recent progress and a practical guide,” Analytica Chimica Acta. 1081(12), 6–17 (2019). [CrossRef]

14. A. Signoroni, M. Savardi, M. Pezzoni, et al., “Combining the use of CNN classification and strength-driven compression for the robust identification of bacterial species on hyperspectral culture plate images,” IET Comp. Vision. 12(7), 941–949 (2018). [CrossRef]

15. J. Padarian, B. Minasny, and A. B. McBratney, “Using deep learning to predict soil properties from regional spectral data,” Geoderma Regional 16, e00198 (2019). [CrossRef]

16. C. Ni, D. Wang, and Y. Tao, “Variable weighted convolutional neural network for the nitrogen content quantization of Masson pine seedling leaves with near-infrared spectroscopy,” Spectrochim. Acta, Part A 209, 32–39 (2019). [CrossRef]

17. A. Gholizadeh, L. Borůvka, M. Saberioon, et al., “Comparing different data preprocessing methods for monitoring soil heavy metals based on soil spectral features,” Soil Water Res. 10(4), 218–227 (2015). [CrossRef]

18. C. Rudin, “Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead,” Nat. Mach. Intell. 1(5), 206–215 (2019). [CrossRef]

19. H. Tamiminia, B. Salehi, M. Mahdianpari, et al., “Google Earth Engine for geo-big data applications: A meta-analysis and systematic review,” ISPRS Journal of Photogrammetry and Remote Sensing 164, 152–170 (2020). [CrossRef]

20. P. Muthudoss, I. Tewari, R. L. R. Chi, et al., “Machine learning-enabled NIR spectroscopy in assessing powder blend uniformity: clear-up disparities and biases induced by physical artefacts,” AAPS PharmSciTech 23(7), 277 (2022). [CrossRef]

21. L. Perez and J. Wang, “The effectiveness of data augmentation in image classification using deep learning,” arXiv, arXiv:1712.04621 (2017). [CrossRef]

22. C. Shorten and T M. Khoshgoftaar, “A survey on image data augmentation for deep learning,” J. Big. Data. 6(1), 1–48 (2019). [CrossRef]

23. A. Antoniou, A. Storkey, and H. Edwards, “Data augmentation generative adversarial networks,” arXiv, arXiv:1711.04340 (2017). [CrossRef]

24. I. Goodfellow, J. Pouget-Abadie, M. Mirza, et al., “Generative adversarial networks,” Commun. ACM 63(11), 139–144 (2020). [CrossRef]

25. X. Mao, Q. Li, H. Xie, et al., “Least squares generative adversarial networks,” In Proceedings of the IEEE international conference on computer vision, 2794–2802 (2017).

26. T. Karras, S. Laine, and T. Aila, “A style-based generator architecture for generative adversarial networks,” In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 4401–4410 (2019).

27. G. Mu and J. Chen, “Developing a conditional variational autoencoder to guide spectral data augmentation for calibration modeling,” IEEE Trans. Instrum. Meas. 71, 1–8 (2022). [CrossRef]

28. S. Sridevi, T. Kanimozhi, N. Ayyanar, et al., “Deep learning based data augmentation and behavior prediction of photonic crystal fiber temperature sensor,” IEEE Sens. J. 22(7), 6832–6839 (2022). [CrossRef]

29. G. Teng, Q. Wang, J. Kong, et al., “Extending the spectral database of laser-induced breakdown spectroscopy with generative adversarial nets,” Opt. Express 27(5), 6958–6969 (2019). [CrossRef]

30. D. Zhu, L. Xu, X. Chen, et al., “Synthetic spectra generated by boundary equilibrium generative adversarial networks and their applications with consensus algorithms,” Opt. Express 28(12), 17196–17208 (2020). [CrossRef]

31. M. Wu, S. W. Wang, S. Pan, et al., “Deep learning data augmentation for Raman spectroscopy cancer tissue classifcation,” Sci. Rep. 11(1), 23842 (2021). [CrossRef]

32. S. D. Frischia, P. Giammatteo, F. Angelini, et al., “Enhanced data augmentation using gans for Raman spectra classification,” In 2020 IEEE International Conference on Big Data2891–2898 (2020).

33. X. Ma, K. Wang, K. C. Chou, et al., “Conditional generative adversarial network for spectral recovery to accelerate single-cell Raman spectroscopic analysis,” Anal. Chem. 94(2), 577–582 (2022). [CrossRef]

34. D. Morgan and R. Jacobs, “Opportunities and challenges for machine learning in materials science,” Annu. Rev. Mater. Res. 50(1), 71–103 (2020). [CrossRef]

35. R. S. Das and Y. K. Agrawal, “Raman spectroscopy: Recent advancements, techniques and applications,” Vib. Spectrosc. 57(2), 163–176 (2011). [CrossRef]

36. F. van der Meer, “Near-infrared laboratory spectroscopy of mineral chemistry: A review,” International Journal of Applied Earth Observation and Geoinformation 65, 71–78 (2018). [CrossRef]

37. D. Passos and P. Mishra, “A tutorial on automatic hyperparameter tuning of deep spectral modelling for regression and classification tasks,” Chemom. Intell. Lab. Syst. 223, 104520 (2022). [CrossRef]

38. J. Huang, N. P. Sullivan, A. Zakutayev, et al., “How reliable is distribution of relaxation times (DRT) analysis? A dual regression-classification perspective on DRT estimation, interpretation, and accuracy,” Electrochim. Acta 443, 141879 (2023). [CrossRef]

39. A. Zelaci, A. Yasli, C. Kalyoncu, et al., “Generative adversarial neural networks model of photonic crystal fiber based surface plasmon resonance sensor,” J. Lightwave Technol. 39(5), 1515–1522 (2021). [CrossRef]

40. X. Ding, Y. Wang, Z. Xu, et al., “Continuous conditional generative adversarial networks: novel empirical losses and label input mechanisms,” IEEE Trans. Pattern Anal. Mach. Intell. 45(7), 1–16 (2023). [CrossRef]

41. X. Ding, Y. Wang, Z. Xu, et al., “Ccgan: Continuous conditional generative adversarial networks for image generation,” In International conference on learning representations (2020).

42. M. Stepien, C. A. Ferreira, S. Hosseinzadehsadati, et al., “Continuous conditional generative adversarial networks for data-driven modelling of geologic CO₂ storage and plume evolution,” Gas Sci. Eng. 115, 204982 (2023). [CrossRef]

43. T. Kadeethum, D. O’Malley, Y. Choi, et al., “Continuous conditional generative adversarial networks for data-driven solutions of poroelasticity with heterogeneous material properties,” Comput. Geosci. 167, 105212 (2022). [CrossRef]

44. J. Lu, R. Niu, S. Wan, et al., “Experimental demonstration of multimode microresonator sensing by machine learning,” IEEE Sens. J. 21(7), 9046–9053 (2021). [CrossRef]

45. M. Mirza and S. Osindero, “Conditional generative adversarial nets,” arXiv, arXiv:1411.1784 (2014). [CrossRef]

46. Z. Lin, V. Sekar, and G. Fanti, “Why spectral normalization stabilizes gans: Analysis and improvements,” Advances in Neural Information Processing Systems 34, 9625–9638 (2021).

47. H. Ren, C. L. Zou, J. Lu, et al., “Highly sensitive intensity detection by a self-interference micro-ring resonator,” IEEE Photonics Technol. Lett. 28(13), 1469–1472 (2016). [CrossRef]

48. H. Ren, C. L. Zou, J. Lu, et al., “Dissipative sensing with low detection limit in a self-interference microring resonator,” J. Opt. Soc. Am. B 36(4), 942–951 (2019). [CrossRef]

49. S. Wan, R. Niu, H. Ren, et al., “Experimental demonstration of dissipative sensing in a self-interference microring resonator,” Photonics Res. 6(7), 681–685 (2018). [CrossRef]

50. D. Hu, C. -L. Zou, H. Ren, et al., “Multi-parameter sensing in a multimode self-interference micro-ring resonator by machine learning,” Sensors 20(3), 709 (2020). [CrossRef]

51. H. Chen, Z. Wang, Y. Wang, et al., “Machine learning-assisted high-accuracy and large dynamic range thermometer in high-Q microbubble resonators,” Opt. Express 31(10), 16781–16794 (2023). [CrossRef]

52. B. Duan, H. Zou, J.-H. Chen, et al., “High-precision whispering gallery microsensors with ergodic spectra empowered by machine learning,” Photonics Res. 10(10), 2343–2348 (2022). [CrossRef]

53. Z. Li, H. Zhang, B. Nguyen, et al., “Smart ring resonator–based sensor for multicomponent chemical analysis via machine learning,” Photonics Res. 9(2), B38–B44 (2021). [CrossRef]

54. X. Tian, L Zhou, L Li, et al., “Deep learning assisted microwave photonic dual-parameter sensing,” IEEE Journal of Selected Topics in Quantum Electronic, to be published, (2023).

55. Q. Peng, A. Gilman, N. Vasconcelos, et al., “Robust deep sensing through transfer learning in cognitive radio,” IEEE Wireless Commun. Lett. 9(1), 38–41 (2020). [CrossRef]

56. D. C. Cireşan, U. Meier, and J. Schmidhuber, “Transfer learning for Latin and Chinese characters with deep neural networks,” In The 2012 international joint conference on neural networks (IEEE IJCNN), 1–6 (2012).

57. J. Yang, X. Li, H. Lu, et al., “An LIBS quantitative analysis method for alloy steel at high temperature based on transfer learning,” J. Anal. At. Spectrom. 33(7), 1184–1195 (2018). [CrossRef]

58. D. Berthelot, T. Schumm, and L. Metz, “Began: boundary equilibrium generative adversarial networks,” arXiv, arXiv:1703.10717 (2017). [CrossRef]

Model	Parameter	Value	Parameter	Value
Pre-trained network	Epoch	2000	Batch_size	39
	Optimizer	Adam	Learning_rate	0.001
	Activation	Softmax	Regularization	BN
T₃	Epoch	500	Batch_size	1
	Optimizer	SGD	Learning_rate	0.0001
	Activation	ReLU	Regularization	Groupnorm
CcGAN	$σ$	0.1509	$κ$	0.0263
	iteration	15000	batch_size	39
	Optimizer	Adam	Learning_rate	0.0001
	Activation	ReLU	Regularization	BN(G)/SN(D)

Batch_size	Iteration
Batch_size	1000	5000	10000	15000	20000	25000	30000	35000	40000
10	0.839	0.967	0.953	0.969	0.973	0.974	0.978	0.971	0.967
20	0.836	0.974	0.974	0.969	0.975	0.976	0.976	0.977	0.979
39	0.861	0.974	0.977	0.982	0.978	0.977	0.981	0.981	0.978

Model	Parameter	Value	Parameter	Value
DNN	Epoch	[4000,6000]	Batch_size	[16,32]
	Optimizer	Adam	Learning_rate	[10⁻²,10⁻⁴]
	Activation	Softmax	Regularization	BN

Model	Parameter	Value	Parameter	Value
Pre-trained network	Epoch	2000	Batch_size	39
	Optimizer	Adam	Learning_rate	0.001
	Activation	Softmax	Regularization	BN
T₃	Epoch	500	Batch_size	1
	Optimizer	SGD	Learning_rate	0.0001
	Activation	ReLU	Regularization	Groupnorm
CcGAN	$σ$	0.1509	$κ$	0.0263
	iteration	15000	batch_size	39
	Optimizer	Adam	Learning_rate	0.0001
	Activation	ReLU	Regularization	BN(G)/SN(D)

Batch_size	Iteration
Batch_size	1000	5000	10000	15000	20000	25000	30000	35000	40000
10	0.839	0.967	0.953	0.969	0.973	0.974	0.978	0.971	0.967
20	0.836	0.974	0.974	0.969	0.975	0.976	0.976	0.977	0.979
39	0.861	0.974	0.977	0.982	0.978	0.977	0.981	0.981	0.978

Data augmentation using continuous conditional generative adversarial networks for regression and its application to improved spectral sensing

Abstract

1. Introduction

2. Method and algorithm

2.1 Generate spectral samples with continuous conditional labels using CcGAN

2.2 Applying synthetic spectra generated by CcGAN in the regression model

3. Experiment

3.1 Experimental setup

3.2 Sample preparation

4. Results and discussion

4.1 Training CcGAN

4.2 Synthetic spectra generated by CcGAN

4.3 Performance of data augmentation on regression models

5. Conclusion

Funding

Acknowledgment

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (5)

Tables (3)

Equations (2)

Optics Express