Expand this Topic clickable element to expand a topic
Skip to content
Optica Publishing Group

Deep D2C-Net: deep learning-based display-to-camera communications

Open Access Open Access

Abstract

In this paper, we propose Deep D2C-Net, a novel display-to-camera (D2C) communications technique using deep convolutional neural networks (DCNNs) for data embedding and extraction with images. The proposed technique consists of fully end-to-end encoding and decoding networks, which respectively produce high-quality data-embedded images and enable robust data acquisition in the presence of optical wireless channel. For encoding, Hybrid layers are introduced where the concurrent feature maps of the intended data and cover images are concatenated in a feed-forward fashion; for decoding, a simple convolutional neural network (CNN) is utilized. We conducted experiments in a real-world environment using a smartphone camera and a digital display with multiple parameters, such as transmission distance, capture angle, display brightness, and resolution of the camera. Experimental results prove that Deep D2C-Net outperforms the existing state-of-the-art algorithms in terms of peak signal-to-noise ratio (PSNR) and bit error rate (BER), while the data-embedded image displayed on the screen yields high visual quality for the human eye.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

1. Introduction

Optical camera communication (OCC) [13] is a prominent technology operating in the visible light spectrum (380–700 nm) and utilizes promptly available transceiver components, such as light-emitting diodes (LEDs) and digital cameras. As part of optical wireless communications (OWC) [46], which is regarded as a complementary alternative to the limitedly available radio frequency (RF)-based communications, OCC possesses noteworthy advantages, such as abundant and unregulated spectrum, link-level security, and easy availability. Owing to the unique ability to separate incident light in the spatial and color domains, OCC has aroused an imperative prospect for machine-to-machine (M2M) interaction services.

Display-to-camera (D2C) communication [715] is a promising candidate for short-range M2M communication with a display–camera link. On the transmitter side of a D2C system, digital multimedia contents such as images and videos provide visual information via electronic displays along with additional data. The embedded data can alter the appearance and underlying statistics of the images, and therefore, embedding should be carried out without compromising the integrity of the displayed content. Specifically, the hidden data should be imperceptible to the human eye. At the receiving end, the camera captures the visual content displayed on the screen, which is then followed by data decoding. The D2C system must deliver seamless and ubiquitous communication and must enable successful acquisition of the embedded data on the receiver side. Furthermore, since the transmission occurs via display–camera link in a wireless manner, the embedded data must be robust to several types of signal-processing distortions on the wireless optical channel.

Recent efforts [712] have made valuable progress on implementing D2C algorithms, concentrating on embedding data in the form of 2D barcodes or in the spatial and frequency domains of digital images. The QR code [7], a 2D barcode technique in D2C communication scenario, consists of a square black pattern on a white background for storing information that can be easily read by smartphone cameras. However, due to the limited display area, transmission of large amount of information is not possible, and additional effort is required to capture the codes. Therefore, 2D color barcodes were investigated [89], which facilitate high-capacity data embedding, better synchronization, and accurate code extraction on the receiver side. These techniques enable seamless transfer of large amounts of data but cannot hide data in visually recognizable multimedia contents (images/videos). To overcome the constraints of these systems, spatial data–embedding techniques, where data are hidden in the spatial component (i.e., the pixels) of an image were investigated [1012]. These D2C approaches not only enable various types of information transfer in a line-of-sight (LOS) environment, but also consider the viewing experience of the user into account. This facilitates simultaneous data transmission via optical wireless channel environments without compromising the visual quality of the multimedia content. However, the spatially embedded data are susceptible to noise induced in optical wireless channels, and thus require complex modulation techniques for robust data communication. For this reason, frequency-based data-embedding techniques [1315] were explored as an alternative approach to establish full-frame, unobtrusive communication in a D2C link. These techniques exploit spectral component of the digital image and introduce tiny perturbations in the selected coefficients to embed the desired data. Because the data are inserted into robust coefficients of the spectral domain image, these techniques are relatively less prone to distortion of the embedded data, despite transmission over optical wireless channels.

With the recent advancements in the field of big data and artificial intelligence (AI), tremendous growth has been observed in bridging the gap between the capabilities of humans and machines. To this end, several deep learning (DL) technologies have been applied to create useful representations from the available data to solve real-world problems. The DL technologies learn to accomplish the real-world problems beyond human level of accuracy, if provided with sufficient data. From the various DL models, the deep convolutional neural network (DCNN) [16,17], a multi-layer feed-forward network, has been applied widely to image classification [18], semantic segmentation [19], and machinery fault detection [20]. Due to the preeminence of DCNNs in extracting essential features from raw data, incorporating them for embedding and extracting data in fully automated D2C link has attracted massive interest among researchers. DCNNs have been used in watermarking and steganography [21,22] for hiding and extracting data into and from the image respectively. These techniques consider the visual imperceptibility of hidden data inside an image and its robustness to image distortions. Zhang et al. [23] proposed the SteganoGAN model, which adapts popular deep learning frameworks such as ResNet [24] and DenseNet [25] for encoding text messages inside images. Although this approach has acquired convincing results in terms of the visual quality of the data-embedded image, the lack of channel information during training of the model limits successful data-recovery performance. Therefore, to approximate data loss due to the noxious property of the optical wireless channel, and the phase noise created from the camera’s out-of-focus effect, a D2C model trained with noise layers must be introduced. In particular, the DCNN-based encoder and decoder models need to be trained with noise layers; otherwise, communication performance over a real D2C channel cannot be guaranteed. Another similar study conducted by Zhu et al. [26] introduced HiDDeN, a set of DCNNs for encoding and decoding purposes. The encoder-decoder network of this study considered multiple noise in between, such as JPEG compression, cropping, and dropout, to increase the robustness of encoded data against channel distortions during transmission. However, in spite of providing an impressive visual quality of the display content, the model failed to consider the effect of possible color distortions on the captured image which can instigate due to lighting variations in the optical channel. Usually, these type of color distortions significantly impact the embedded data and impose a threat to successful data acquisition. Unlike the above works, Tancik et al. [27] proposed the StegaStamp model, which leverages a U-Net [28] structure for encoding text messages inside digital images. The research successfully demonstrated real-time decoding of hyperlinks from physically printed photographs in the presence of an optical wireless channel. The model achieved so by introducing pixel-wise and spatial perturbations to account for the channel distortions of transmitted data embedded inside multimedia content. However, the display quality of the multimedia content was deteriorated, resulting in a trade-off between visual quality and communication performance. In addition, when the distance between transmitter and receiver is large, or when images are captured at different resolutions and under various lighting conditions, the communication performance is significantly degraded. This can be explained by the limited capabilities of feature extraction in the decoding network, and thus, the feature extraction capacity of the network needs to be supplemented to show robust performance in the D2C environment.

To overcome these shortcomings, we propose in this paper a novel framework that explores suitable DL approach to imperceptible data-encoding techniques (and robust data-decoding) for a D2C system. Specifically, we designed a novel DCNN structure called Deep D2C-Net, which is an end-to-end network that facilitates real-time encoding and decoding of binary data from digital images. The proposed model integrates DCNN for data embedding, image reconstruction, and data recovery from captured images. The encoder structure consists of multiple layers, where each layer produces a conjoint feature map by concatenating the feature maps obtained from two convolutional series of cover images, and upsampled data. We refer to these layers as Hybrid layers, and they concurrently form an encoding network along with the subsequent 2D convolutional layers at a later stage. The Hybrid layer structure of the encoder aims to maximize the peak signal-to-noise ratio (PSNR) of the display content and, in conjunction with the decoder, attempts to minimize the bit error rate (BER) by reducing losses during training. The decoder accepts data-embedded images captured by the camera and passes them through a series of 2D convolutional layers for data extraction. Unlike previous DCNN-based data-embedding approaches, Deep D2C-Net is a novel structure that not only facilitates successful data embedding with minimal visible artifacts, but also considers the impact of the optical wireless channel on the transmitted data by introducing multiple noise layers during the training session. In other words, the tradeoff between visual quality and the communication performance is substantially annihilated. After conducting real-world experiments under several environmental parameters, such as capture angle θ, transmission distance D, camera resolution, and ambient light, the results show that the proposed scheme can overcome the existing state-of-the-art DCNN-based data-embedding and extraction approaches by providing excellent BER performance for a short-distance D2C link. Furthermore, the proposed scheme yields promising visual-quality performance of the data-embedded image in terms of PSNR.

The rest of this paper is organized as follows. In Section 2, the detailed encoder–decoder network architecture of the proposed model is presented. In Section 3, analysis and evaluation of the results obtained from multiple experiments are provided. Finally, the concluding remarks are reported at Section 4.

2. Proposed model

Figure 1 illustrates an overview of the DCNN-based D2C system. At the transmitter are a binary data vector, ${\boldsymbol b}$, and a digital cover image, ${{\boldsymbol I}_{\boldsymbol o}} \in {\mathbb{ R}}^{\textrm{H} \times \textrm{W} \times \textrm{C}}$ ($\textrm{H}$, $\textrm{W}$, and $\textrm{C}$ are the height, width, and the number of channels of ${{\boldsymbol I}_{\boldsymbol o}}$), which are inserted into the DCNN-based encoder. Before passing ${\boldsymbol b}$ to the encoder, it is subjected to reshaping and upsampling operations to obtain upsampled data, ${\boldsymbol d}$, in a 2D space with dimensions H${\times} $W${\times} $1. The purpose of employing an upsampling operation is to facilitate concatenation of the feature maps of both the cover image and the input data during the feed-forward process. After the completion of training session, the learned encoder, $\varepsilon ({{{\boldsymbol I}_{\boldsymbol o}},\; {\boldsymbol d}} )$, produces data-embedded image ${{\boldsymbol I}_{\boldsymbol E}}$ with the same shape as that of ${{\boldsymbol I}_{\boldsymbol o}}$, which is then displayed on a screen for visual purpose, and at the same time captured by a camera at the receiver for decoding purpose. In a real-world D2C communications scenario, the captured image is susceptible to multiple noises from several signal-processing operations that occur in the optical wireless channel. These noises tend to affect both the spatial and spectral domains of the transmitted ${{\boldsymbol I}_{\boldsymbol E}}$, and ultimately impair the embedded data. Therefore, to model the effect of the optical channel on the transmitted image, and compensate the distortions, several stochastic noise layers are introduced during the training process. The noise layers are responsible for nullifying the spatial distortions such as additive noise, color transformations, and image blur, and spectral distortions such as JPEG compression, which are instigated due to the wireless nature of transmission channel. Similarly, at the receiver, the captured image ${{\boldsymbol I}_{\boldsymbol C}}$ suffers from geometric distortions, such as rotation, scaling, and translations, arising from unsteady capture positions and the orientation of the camera. Therefore, a geometric correction technique is applied to obtain corrected image ${{\boldsymbol I}_{\boldsymbol G}}$, which is then finally subjected to decoding for data retrieval. $\delta ({{{\hat{{\boldsymbol I}}}_{\boldsymbol E}}} )$ is a trained decoder which, after a robust decoding procedure, produces output data $\hat{{\boldsymbol b}}$. The encoder and decoder are trained in an end-to-end approach with two objectives: to minimize image reconstruction loss ${L_I}$ (loss between the cover image and the encoded image), and data reconstruction loss ${L_D}$ (loss between input and output data).

 figure: Fig. 1.

Fig. 1. Schematic overview of proposed D2C model.

Download Full Size | PDF

2.1 Reshaping and upsampling the input data

The one-dimensional (1D) input binary data, ${\boldsymbol b} \in {\{{0,1} \}^M}$, of length M is first fed into a fully connected (FC) layer with dimensions of 1024${\times} $1, as shown in Fig. 2. The outcome of the FC layer, ${{\boldsymbol b}_{\boldsymbol F}}$, which is a 1D vector of length 1024, is subjected to reshaping to generate a 2D matrix with dimensions of 32${\times} $32${\times} $1. Subsequently, after reshaping, the 32${\times} $32${\times} $1 matrix is upsampled by interpolating it to obtain ${\boldsymbol d}$. The upsampled data, ${\boldsymbol d}$, with dimensions of H${\times} $W${\times} $1 (the same height and width as the cover image) constitute a single channel. Note that the use of reshaping and upsampling blocks replicates the 1D message data spatially in a 2D space. This ensures that the 2D filters present in each convolutional layer of the encoding network have access to the entire receptive area of ${\boldsymbol d}$. By doing so, the intermediary representation of ${{\boldsymbol I}_{\boldsymbol o}}$ and ${\boldsymbol d}$, produced by their corresponding convolutional series, can be concatenated collectively to form a Hybrid layer. Therefore, in each Hybrid layer of the encoder, the textual information in ${\boldsymbol d}$ is embedded into the spatial location of ${{\boldsymbol I}_{\boldsymbol o}}$ through feature map concatenation.

 figure: Fig. 2.

Fig. 2. Reshaping and upsampling the input data.

Download Full Size | PDF

2.2 Encoding and decoding networks

Figure 3 shows the complete network architecture of the encoding and decoding structure of the proposed Deep D2C-Net. The encoder network receives upsampled binary data ${\boldsymbol d}$ and cover image ${{\boldsymbol I}_{\boldsymbol o}}$ as input to produce data-embedded image ${{\boldsymbol I}_{\boldsymbol E}}$ for display purposes. We can see from the figure that each input passes through a separate series of 2D convolutional layers to obtain the corresponding intermediate representations. The concurrent feature maps extracted from the convolution series of both ${{\boldsymbol I}_{\boldsymbol o}}$ and ${\boldsymbol d}$ are concatenated, which we refer to as Hybrid layers. The outcome of each Hybrid layer is feed-forwarded to the upcoming convolutional layer of the same series. Similarly, the feature maps of ${\boldsymbol d}$ passes to an imminent convolutional layer. This process is repeated until the sixth Hybrid layer, where its output is forwarded to the remaining three 2D convolutional layers to obtain ${{\boldsymbol I}_{\boldsymbol E}}$. In each Hybrid layer, ${\boldsymbol d}$ is embedded into ${{\boldsymbol I}_{\boldsymbol o}}$ through feature map concatenation and continuously trained in an end-to-end fashion. This ultimately allows the encoder to learn to conceal the data into the cover image in such a way that visible artifacts on ${{\boldsymbol I}_{\boldsymbol E}}$ is minimized. Moreover, the main purpose of introducing Hybrid layers in the encoder network is not only to achieve significant gain in the PSNR of ${{\boldsymbol I}_{\boldsymbol E}}$, but also to enhance the BER performance by minimizing training losses. Note that a skip connection is introduced in the proposed network between the original cover image and the eighth convolutional layer. With skip connections, the preliminary features of the cover image are bypassed to the final layers [25]. In doing so, the network can preserve prominent features such as textual features of the cover image to reflect it into the output of the encoder network. In deep networks such as Hybrid layer-based encoder, the image restoration tasks tend to easily suffer from performance degradation because significant amount of image details might be corrupted during the feed forward process. Therefore, skip connections help to preserve the features from previous layers and aid to generate high quality images during reconstruction. Also, with skip connections the gradient vanishing problem of the DCNNs can be mitigated [29].

 figure: Fig. 3.

Fig. 3. Deep D2C-Net architecture: (a) the encoder, and (b) the decoder.

Download Full Size | PDF

On the receiver side, the data-embedded image shown through the electronic display is detected by the camera sensor. In our scheme, ${{\boldsymbol I}_{\boldsymbol C}}$ is the captured image containing embedded data. Since the display content is transmitted via the optical wireless channel, the spatial domain of ${{\boldsymbol I}_{\boldsymbol C}}$ is geometrically distorted; due to this, the corners remain significantly altered. Therefore, it is important to adapt geometric correction techniques to compensate for the distortions before passing it for the decoding purpose. In our scheme, we employ a perspective transformation technique that selects four points of ${{\boldsymbol I}_{\boldsymbol C}}$, and unwarps it to deliver a geometrically correct image, ${{\boldsymbol I}_{\boldsymbol G}}$. The selected points specify the region of interest (ROI) for the target image. After the completion of geometric correction, the image immediately passes to the decoder, where the embedded data are extracted.

At the decoder, the challenge is to recover the distorted signal as accurately as possible. In conventional wireless communication systems, receivers are often designed with prior knowledge of the modulation and coding strategy of the transmitter. In particular, the receiver needs to adapt several procedures, such as channel estimation, equalization, demodulation, and channel decoding in order to recover the data bit stream [30]. Deep D2C-Net, however, utilizes an intelligent decoder based on DCNN that implements end-to-end information recovery from the received signal. The DL-based decoder learns the complex relationship between the received signal and the transmitted sequence of information thus, the data can be recovered as reliably as possible even under various non-ideal conditions of the optical wireless channel. To cope with the effects of channel distortion, the decoder with the DCNN structure learns the intrinsic representation of the data from the received image, which is distorted in various ways through multiple noise layers. The learned model closely matches the underlying channel distortions that occur in any D2C systems, thus achieving robust bit recovery performance in the real-world environments.

As shown in Fig. 3(b), the decoder consists of several convolutional layers (eight 2D convolutional layers) for feature extraction and a single FC layer as a classification layer. Since the input data consists of M bits, a total of M binary classifiers are used in this classification layer, where each binary classifier recovers one bit (0 or 1). The number of classifiers in the final FC layer corresponds to the total number of recovered data bit stream given by $\hat{{\boldsymbol b}} \in {\{{0,1} \}^M}$. Details on the sizes of the filters and the input and output dimensions of the encoder and decoder networks, are presented in Tables 1 and 2, respectively.

Tables Icon

Table 1. Encoding network details.

Tables Icon

Table 2. Decoding network details.

2.3 Robustness against the optical wireless channel

The characteristics of the optical wireless channel may vary in accordance with the link configuration. Since the D2C systems facilitate short-range communications, such systems usually follow directed line of sight (DLoS) link configuration where the transmitter is directly oriented towards the receiver. Due to this, the D2C systems not only allow high immunity to multipath fading, but also offers low power requirement and higher data rates. However, because the D2C system uses a light-emitting source to transmit information, and operates on a fully optical wireless path, the received signal is susceptible to various distortions from the optical wireless channel [31]. If the risk of impairment of the communication link is significant, it will eventually affect the system performance and its capacity [32].

Particularly, in the D2C systems, the received data can be distorted due to various factors when transmitted over the optical wireless channel. For example, LOS configuration such as the position and orientation of the receiver camera, the environmental effects such as ambient lighting background illumination, and several predominant signal processing factors such as analog-to-digital (A/D) and digital-to-analog (D/A) conversions impose significant threat to degrade the received signal. This impairment of received signal ultimately leads to loss of embedded data. Therefore, in order to compensate the effect of the optical wireless channel and overcome the channel distortion effect, a D2C system should be constructed in consideration of the inherent characteristics of the optical wireless channel.

In case of DL-based D2C systems, a robust noise model should be introduced between the encoder-decoder system to compensate for the effect of channel distortion. Therefore, in our scheme, to overcome the atrocious effects of the transmission channel on the embedded data, as well as to approximate the physical distortions of the captured image, we apply a series of stochastic noise layers during the training session. These random sets of transformations are performed as follows on data-embedded image ${{\boldsymbol I}_{\boldsymbol E}}$ before it passes to the decoder:

$${\hat{{\boldsymbol I}}_{\boldsymbol E}} = {{\boldsymbol I}_{\boldsymbol E}} + {B_L}(G )+ {C_T} + N + {C_{JPEG}},$$
where, ${B_L}(G )$ is Gaussian blur, ${C_T}$ is color transformation, N is additive Gaussian noise, and ${C_{JPEG}}$ is the JPEG compression layer. Figure 4 presents the sequence of transformed images that simultaneously pass through several noise layers. Initially, the data-embedded image ${{\boldsymbol I}_{\boldsymbol E}}$ is subjected to Gaussian blur, subsequently followed by addition of random color transformation, Gaussian noise, and JPEG compression, as seen in Eq. (1). The noise layers used in the network can be adjusted based on the environment of the wireless channel as it affects the transmitted image. Whenever a camera captures an image, the large D, misaligned capture angle and unsteady camera movement will introduce an unwanted image blur. This image blur is usually modeled as phase noise of additive white Gaussian noise (AWGN), which is commonly used to simulate the wireless communication channel. Taking this fact into consideration, in the noise layer, we apply a Gaussian blur kernel with zero mean, and randomly sampled standard deviation and apply transformation to the data-embedded image ${{\boldsymbol I}_{\boldsymbol E}}$.

Furthermore, whenever the image is captured from the display, the captured image might experience color distortions. In particular, the hue, which represents the lightness/darkness of the color in an image, can be affected. When capturing an image under diverse lighting conditions, certain unbalanced saturations in color intensity might be encountered as well, due to which the image color will appear more faded or washed out. To address this issue, a color transformation approach is utilized to counterbalance the unwanted color shifts and extra brightness/contrast, as seen in Eq. (2):

$${C_T} = {H_{offset}} + {D_e} + {C_{offset}},$$
where ${H_{offset}}$ denotes the uniformly sampled hue offset values in the range [−0.1, +0.1]. These values are added in a stochastic manner to each color channel of the RGB data-embedded image. Similarly, ${D_e}$ is the image desaturation value, which is obtained by linearly interpolating the values between ${{\boldsymbol I}_{\boldsymbol E}}$ and its grayscale image. This accounts for the image saturation problem observed in the captured image due to the properties of the optical wireless channel. And, ${C_{offset}}$ is the color offset value compensating for appalling brightness and contrast instigated in the captured image. The ${C_{offset}}$ value is added as follows:
$${C_{offset}} = \alpha {{\boldsymbol I}_{\boldsymbol E}} + \beta, $$
where, $\alpha \in $ [0.5, 1.5], and $\beta \in $ [−0.3, +0.3] are the uniformly sampled brightness and contrast values, respectively.

 figure: Fig. 4.

Fig. 4. Sequence of noise layers between the encoder and the decoder.

Download Full Size | PDF

To account for other noises such as photon noise and shot noise introduced by the characteristics of the camera, we employ additive Gaussian noise model N with a standard deviation uniformly sampled within the range [0, 0.2]. Finally, to compensate for image compression problems in the high-frequency regions of the image, a differentiable JPEG approximation [33] is utilized, where the quality of the JPEG is uniformly sampled within the range [50, 100]. Note that the real-world communication scenario for D2C is entirely optical and wireless in nature. Data transmitted over this channel are subjected to distortions, which ultimately makes the decoding process difficult at the receiving end. Therefore, the introduction of stochastic noise layers during training ensures that the embedded data are immune to the variety of image distortions that can occur in the optical wireless channel. Furthermore, it is worth specifying the fact that these noise layers generalize the fundamental characteristics of the optical wireless channel, thus, are suitable to be adapted in any type of environmental changes. However, for environmental extremities such as rain, snow or humid weather conditions, it is not possible to express these effects in detail with the existing noise layer model. Therefore, their corresponding noise parameters should be included, and the corresponding model should be retrained with very large dataset. The value of these parameters can be determined (either stochastically or experimentally), and the encoder-decoder model can be retrained along with it to acquire the desired results, even in the case of environmental extremities. By using all these random sets of transformations in the training session, a novel DCNN model that is robust to optical channel distortion can be obtained.

2.4 Training

We trained Deep D2C-Net in an end-to-end fashion, where simultaneous training of the encoder and decoder networks is performed. The model was trained on 105,000 cover images from the MS COCO [34] dataset, which were resized to the desired dimensions for ${{\boldsymbol I}_{\boldsymbol o}}$. For input data, we utilized 200 bits, all sampled from normal distribution. The total training time was approximately 18 hours on an Nvidia GeForce RTX 2080 Ti GPU hardware. To evaluate the model, we used 1000 test images unseen during training, and each image was sampled randomly from all available classes of the Linnaeus dataset [35]. This dataset contains a total of 5 classes, where each class consists 400 test images of size 256${\times} $256 pixels. The encoding and decoding networks were iteratively optimized with the network parameters presented in Table 3. For training, we employed the Adam optimizer [36] due to its excellent computational efficiency and good performance while training with big data. In addition, except for the last layer, all the convolutional layers in the encoding and decoding networks used a rectified linear unit (ReLU) activation function. Because of its unsaturation property and the unlikelihood of vanishing high gradients, ReLU achieves faster training convergence.

Tables Icon

Table 3. Network parameters.

As part of our overall network loss, we devised two loss functions for convergence during training of the model. First, to visualize the proximity between cover image and data-embedded image, we enforced mean square error (MSE) loss ${L_I}$ between them. Second, a cross entropy (CE) loss, ${L_D}$, was adapted to investigate the similarity between encoded and decoded binary data. Therefore, the training objective is to minimize total loss LT, as depicted in Eq. (6) below.

$${L_I} = \frac{1}{{H \times W \times C}}\parallel {{\boldsymbol I}_{\boldsymbol o}} - \varepsilon ({{{\boldsymbol I}_{\boldsymbol o}},{\boldsymbol d}} ){\parallel ^2},$$
$${L_D} = \; CE({\delta ({\varepsilon ({{{\boldsymbol I}_{\boldsymbol o}},{\boldsymbol d}} )} ),{\boldsymbol b}} ),$$
$${L_T} = Minimize({{L_I} + {L_D}} ),$$

To reduce the generalization error, the batch size was kept very small (at 4), such that a total of 26,250 steps were performed to process the whole batch, constituting an epoch. With a learning rate of 10−4, the model was trained for total of 200,000 steps. In Figs. 5(a) and 5(b), we can observe the learning curves of Deep D2C-Net as evaluated under the training dataset, and concludes that the model was trained well. We can see from the figures that the proposed model’s training was completed up to the step where saturation in image quality and communication performance occurs.

 figure: Fig. 5.

Fig. 5. Learning curves of the training model: (a) encoded image loss, and (b) bit accuracy.

Download Full Size | PDF

3. Experiments and evaluation

The proposed Deep D2C-Net model was evaluated under extensive experiments in a real-world environment. The data-embedded image was displayed on a digital screen while the display content was captured with an off-the-shelf smartphone camera. To ensure steady capture and minimize the effect of motion blur, the smartphone was held with a stand. Perfect frame synchronization between transmitter and receiver was assumed when capturing the sequence of images using a camera. The parameters used throughout the experiments are listed in Table 4. A Samsung display with a resolution of 2150×1920 pixels and a display rate (${R_D}$) of 60 Hz was used as a transmitter. For the receiver, we used One-plus 5T smartphone camera with capture resolution of 1920×1080 pixels and a capture rate (${R_C}$) of 120 fps. The whole model was trained to embed 200 bits of binary data in an image of size 256×256 pixels. The experiments were conducted indoors affected by a normal ambient illuminance from the ceiling lights.

Tables Icon

Table 4. Experimental parameters.

3.1 Visual quality, PSNR, and BER

Figure 6 shows randomly selected examples of cover images along with their corresponding data-embedded images as obtained by using the state-of-the-art StegaStamp and the proposed Deep D2C-Net models. The average processing time for the encoder to generate a single data-embedded image was approximately 0.41 seconds according to our hardware specifications. We can clearly see that the data-embedded images of Deep D2C-Net present relatively better visual perception for the human eye than StegaStamp. Because the feature maps obtained by both the convolutional series of the cover image and the up-sampled data are concatenated with a Hybrid layer, the low-level features are well-propagated to the final layers. In this manner, the data embedded image can retain rich spatial information of the cover image, which ultimately contributes to maximize the PSNR value. In addition, the skip connection introduced between the cover image and ultimate layers precisely preserves the visual quality of the data-embedded image.

 figure: Fig. 6.

Fig. 6. Randomly selected cover images from the Linnaeus dataset [35] (top row), data-embedded images of StegaStamp [27] (middle row), and data-embedded images of Deep D2C-Net (bottom row).

Download Full Size | PDF

As shown in Table 5, we compared the PSNR and BER performance of Deep D2C-Net against several state-of-the-art data-embedding and steganography techniques based on DCNNs [23,26,27]. For testing, we used the same number of input data as in training, i.e., 200 data bits were embedded into a single cover image of size 256${\times} $256, and a total of 1000 cover images were randomly sampled from the Linnaeus dataset [35] to evaluate the communication performance. BER value was calculated as the percentage of bits that have errors relative to the total number of bits received in the test phase. The average time taken for the receiver to decode the embedded data was approximately 0.32 seconds. Note that the processing time taken for encoder and decoder to generate the corresponding output varies greatly depending on the hardware performance and DCNN structure, so optimizing the hardware configuration and network structure can shorten the processing time. For the real-world experiments, D and θ between the transmitter and receiver were set to 10 cm and 90°, respectively. Altogether, the HiDDeN model comprised an encoder, a decoder, and a composite noise model consisting of multiple layers, including dropout, cropout, Gaussian noise, JPEG mask, and JPEG drop. By implementing their source code, which was released online, the HiDDeN model yielded a PSNR of 37.84 dB; however, the BER performance in the presence of the optical wireless channel was horrendous. Although multiple noise layers were applied to nullify the effect of signal-processing distortions from the optical wireless channel, the embedded data were susceptible to noise generated from the display units. Usually, noise such as color distortions and Moire patterns [37] occur owing to lens distortions and lighting variations when capturing the display with a camera. Overall, we can see that the PSNR from SteganoGAN outperformed the rest of the data-embedding schemes. Particularly, SteganoGAN (Dense) achieved excellent PSNR amongst all of them, which can be attributed to the characteristics of DenseNet for propagating feature maps from all previous layers to the later layers. However, since SteganoGAN does not apply any kind of noise layer during the training session, its BER performance degradation can be clearly noticed. This steganography mechanism provides robust image quality for the human eye, but it does not serve as a possible D2C candidate because the embedded data are severely degraded by the noise present in the channel. Not introducing noise layers during training makes this model defenseless against several transmission media attacks in real-world D2C applications.

Tables Icon

Table 5. PSNR and a real-world BER comparison of the proposed system with state-of-the-art DCNN-based steganography techniques.

Notably, the StegaStamp model alleviates all possible shortcomings of these schemes and provides exquisite BER performance. To achieve this, in this scheme, a set of differentiable image perturbations (pixel-wise and spatial) is considered between the encoder and decoder networks when training the model. Specifically, a pipeline model of noise layers is introduced which compensate for the perspective warps, blurring, color manipulation, noise and JPEG compression to achieve real world robustness of transmitted data. However, this is achieved at the expense in the visual quality of the data-embedded image, where severe degradation in PSNR is observed. This is the result of the trade-off relationship between communication performance and visual quality in images containing data. Within the conditions to ensure robust communication performance, the infirmly designed encoder network of the StegaStamp fails to produce high quality images for human vision. Note that our proposed scheme incorporates similar circumstances of image perturbations against channel noise as explained in Section 2.3 and achieves excellent BER performance. Moreover, we can see that the visual quality of the data-embedded image produced by encoder of Deep D2C-Net is significantly enhanced, with adequate rise in the PSNR level. Because concurrent feature maps generated by both the convolutional series of up-sampled data and the cover image are connected in a Hybrid fashion, the low-level features of the encoding network are well-preserved when proceeding through the forthcoming layers. In addition, the skip connection for propagating the textual information from the spatial location of the original cover image to the final layers plays an important role in significantly improving the PSNR.

3.2 Achievable data rate

Figure 7 illustrates the achievable data rate (ADR) obtained by Deep D2C-Net compared with state-of-the-art DCNN steganography techniques. The ADR represents how much data can be transmitted over the D2C link without errors per unit time, thus, it can be expressed as the function of maximum data rate (${R_{max}}$) and the BER as represented by Eq. (7).

$$ADR = ({1 - BER} )\times {R_{max}},$$
where ${R_{max}}$ is calculated as the product of total input bits and the display rate of the transmitter. This experiment was conducted under ideal conditions for image capture, where perfect angular alignment between transmitter and receiver was set. D and θ were kept constant with values of 10 cm and 90°, respectively, with the 90° angle representing perfect angular alignment between display and camera. The total number of data bits to embed was set at 200. We can see that the ADR achieved by Deep D2C-Net and StegaStamp is similar, however, Deep D2C-Net clearly outperformed rest of the baseline schemes. Overall, the proposed method shows an ADR approximately 82% and 54% greater than SteganoGAN (Dense) and HiDDeN, respectively.

 figure: Fig. 7.

Fig. 7. Real-world comparison for ADR of Deep D2C-Net with state-of-the-art models.

Download Full Size | PDF

3.3 Receiver orientation, angle, and distance

To investigate the effect of geometric distortion on the captured image based on the posture of the camera, the display content was captured from variable distances and angles. Figure 8 shows the geometrically distorted images when captured at different orientations of the camera. The large D, and unaligned θ between display and camera result in noise related to the noxious property of the optical wireless channel in which phase noise created from the camera’s out-of-focus depth of field is observed on the captured image. This type of noise usually blurs the captured image and eventually impairs the embedded data. Due to the similar BER performance under ideal conditions (i.e., constant D and θ), as shown in Table 5, we selected the StegaStamp model as a reference for comparison while conducting this experiment. Throughout this experiment, D was varied from 10 cm to 30 cm, and the display was captured with three different values for θ (30°, 90°, 120°).

 figure: Fig. 8.

Fig. 8. Example images captured by the camera: (a) at D = 15 cm and θ = 30°, (b) at D = 15 cm and θ = 90°, (c) at D = 15 cm and θ = 120°, and (d) at D = 30 cm and θ = 90°.

Download Full Size | PDF

As shown in Fig. 9, we can see that the BER increases with an increase in D. Greater distance between display and camera aggravates the intensity of image blur, and introduces severe geometric distortions in the captured image, which in turn leads to stringent decoding challenges. We can also see that, for θ = 90° (denoting perfect alignment), the BER is excellent out of all the other values for θ. In all experiments, we observed that the BER performance of Deep D2C-Net outperformed StegaStamp in every experimental setting. This is due to the robust end-to-end network architecture that allows the encoder's Hybrid layers to minimize BER by reducing training losses along with the decoding network. Furthermore, the Deep D2C-Net decoder incorporates very deep convolutional layers for detecting high-level features from the captured image, followed by a single FC layer to learn the non-linear combination of those features. On the other hand, the decoder for StegaStamp consists of three FC layers for learning non-linear combinations of the features from the preceding convolutional layers. This structural difference enables Deep D2C-Net to essentially provide more meaningful and invariant feature space representatives, regardless of the intensity of image blur. Therefore, in Deep D2C-Net, with sufficient features extracted by the decoder, the data retrieval performance becomes robust to the variations in capture orientation of the camera.

 figure: Fig. 9.

Fig. 9. Comparison of BERs of Deep D2C-Net and StegaStamp when varying D and θ.

Download Full Size | PDF

3.4 Display brightness and ambient lighting

In this experiment, to observe the effect on the BER performance of Deep D2C-Net from varying the display brightness, images were captured under multiple brightness levels in the range [20, 100]. Furthermore, to investigate the effect of ambient light on the captured image, the smartphone camera was used with and without the flash. As shown in Fig. 10, the images captured under normal and ambient light exhibited two different levels of intensity. Figure 11 presents the BER performance comparison of Deep D2C-Net and StegaStamp with respect to the increasing brightness level of the display under both lighting conditions. Throughout this experiment, D and θ were set constant at 15 cm and 90°, respectively. We can clearly observe that both schemes demonstrate similar characteristics, where, as the display brightness increases the BER decreases, eventually converging to zero. Decoding becomes extremely difficult at low brightness levels of the display, and thus, the system is more prone to error. On the other hand, under ambient light, the pixel intensity of the captured image increases due to which the decoder cannot separate it from the transmitted digital content. Therefore, when the display content is captured with the flash, a slight increment in the BER can be observed. From overall characteristics of the BER curve, we observed that Deep D2C-Net outperformed StegaStamp under all working conditions regardless of the same brightness level. This is because the data transmitted through the image was kept as much as possible by iteratively training the encoder with Hybrid layers with the decoder in an end-to-end fashion. During training of Deep D2C-Net, the network hyperparameters are well optimized to the extent that the global minimum loss is achieved, so even in real wireless communication scenarios with ambient light effects, it provides relatively stronger BER performance than StegaStamp. In addition, as the decoder of StegaStamp utilizes an infirm technique for learning non-linear combinations of features from the captured image due to its multiple FC layers, the BER performance is slightly compromised. This proves that the Deep D2C-Net model can provide robust communication in a real-world D2C environment, despite changes in the brightness level and under ambient light.

 figure: Fig. 10.

Fig. 10. Captured images under different lighting conditions: (a) normal lighting, and (b) ambient lighting with flash.

Download Full Size | PDF

 figure: Fig. 11.

Fig. 11. Comparison of BERs of Deep D2C-Net with that of StegaStamp by varying screen brightness under different lighting conditions.

Download Full Size | PDF

3.5 Resolution of the receiver camera

In the experimental results shown in Fig. 12, BER performance from Deep D2C-Net and StegaStamp are compared with respect to the resolution of the camera. Throughout this experiment, D and θ were set to 15 cm and 90°, respectively, and we selected four different resolution settings ranging from low resolution (LR) at 320×240 pixels to high resolution (HR) at 1920×1080. Furthermore, the images were captured under two lighting conditions (ambient light and normal light) similar to the previous experiments as shown in Fig. 10. For both models (Deep D2C-Net and StegaStamp) we can clearly observe from the figure that BER approaches zero whenever the display is captured under HR settings. Contrarily, the error level surges significantly when camera captures the display at LR settings. Images captured with the LR setting will constrain the data extraction capability of the decoder, and thus, become more resilient to data loss, resulting in a higher BER. In addition, we can witness that the BER performance of Deep D2C-Net always outperformed StegaStamp under similar image resolution levels. In Deep D2C-Net, an encoder with Hybrid layers hides data well even in images with low resolution, allowing the decoder network to successfully recover the data. The excellent BER performance of Deep D2C-Net can also be accredited to its deep decoder, which extracts salient features from the captured image by passing it to multiple convolutional layers. Because the use of an HR camera is not always possible in practical applications, the Deep D2C-Net model is a better candidate for practical D2C environments.

 figure: Fig. 12.

Fig. 12. Comparison of BERs of Deep D2C-Net and StegaStamp by varying the resolution of the camera under different lighting conditions.

Download Full Size | PDF

4. Conclusion

This paper introduced Deep D2C-Net, which is a novel technique for data-embedding and extraction based on a DCNN for a full-frame D2C system. The input data and the cover image pass through a series of separate convolutional layers, in which their simultaneous feature maps are concatenated to form Hybrid layers. The encoder structure comprised six Hybrid layers followed by three 2D convolutional layers at a later stage, yielding a visible artifact free data-embedded image, which is displayed on a screen and captured by a smartphone camera at the receiver via optical wireless channel. To compensate the effect of channel distortions on the transmitted data, multiple noise layers were introduced during training. The decoder at the receiver used a series of 2D convolutional layers for output data extraction from the captured image. Through end-to-end training, the encoder and decoder networks were iteratively optimized, and we obtained fair PSNR for the encoded images with robust communication performance when tested in the presence of the optical wireless channel. For a cover image sized 256×256 and with 200-bit input data, the proposed system obtained a PSNR of 31.12 dB, and BER of 0 under ideal conditions when evaluated in a real-world environment. By exploring the capability of DCNNs, the proposed scheme administers an inevitable scope for successful data embedding and extraction with an image for a D2C system.

The scope of our study is limited to evaluating the visual and communication performance of the overall D2C architecture without using any modulation methods. However, there is possibility of achieving higher ADR by employing various modulation techniques to increase the spectral efficiency. The results of this study represent the baseline information of a Deep D2C system and provide insights in the development of the future OWC systems. For our future works, we plan to implement modulation schemes for frequency-based data embedding and extraction using DCNNs. In addition, structural optimization to effectively reduce the computation time of the encoder and decoder is considered as our future work.

Funding

National Research Foundation of Korea (NRF-2019R1A2C4069822).

Disclosures

The authors declare no conflicts of interest.

References

1. N. T. Le and Y. M. Jang, “MIMO architecture for optical camera communications,” J. KICS. 42(1), 8–13 (2017). [CrossRef]  

2. P. Luo, M. Zhang, H. L. Minh, H. M. Tsai, X. Tang, L. C. Png, and D. Han, “Experimental demonstration of RGB LED-based optical camera communications,” IEEE Photonics J. 7(5), 1–12 (2015). [CrossRef]  

3. J. Lain, Z. Yang, and T. Xu, “Experimental DCO-OFDM Optical camera communication systems with a commercial smartphone camera,” IEEE Photonics J. 11(6), 1–13 (2019). [CrossRef]  

4. A. Al-Kinani, C. Wang, L. Zhou, and W. Zhang, “Optical wireless communication channel measurements and models,” IEEE Commun. Surv. Tutorials 20(3), 1939–1962 (2018). [CrossRef]  

5. M. Rahaim and T. D. C. Little, “Interference in IM/DD optical wireless communication networks,” J. Opt. Commun. Netw. 9(9), D51–D63 (2017). [CrossRef]  

6. K. Wang, T. Song, S. Kandeepan, H. Li, and K. Alameh, “Indoor optical wireless communication system with continous and simultaneous positioning,” Opt. Express 29(3), 4582–4595 (2021). [CrossRef]  

7. M. Alajmi, I. Elashry, H. S. El-Sayed, and O. S. Farag Allah, “Steganography of encrypted messages inside valid QR codes,” IEEE Access 8, 27861–27873 (2020). [CrossRef]  

8. T. Hao, Z. Ruogu, and G. Xing, 2012. “COBRA: Color barcode streaming for smartphone systems,” Proceedings of the 10th international conference on Mobile systems, applications, and services (MobiSys ‘12). Association for Computing Machinery, (2012), pp. 85–98.

9. Q. Wang, M. Zhou, K. Ren, T. Lei, J. Li, and Z. Wang, “Rain Bar: Robust application-driven visual communication using color barcodes,” 2015 IEEE 35th International Conference on Distributed Computing Systems, (2015), pp. 537–546.

10. A. Wang, Z. Li, C. Peng, G. Shen, G. Fang, and B. Zeng, (2015). “InFrame++: Achieve simultaneous screen-human viewing and “HiDDeN” screen-camera communication,” MobiSys ‘15: Proceedings of the 13th Annual International Conference on Mobile Systems, Applications, and Services, (2015), pp 181–195.

11. K. Jo, M. Gupta, and S. K. Nayar, “DisCo: Display-camera communication using rolling shutter sensors,” ACM Trans. Graph. 35(5), 1–13 (2016). [CrossRef]  

12. T. Li, C. An, A. T. Campell, and X. Zhou, “HiLight: Hiding bits in pixel translucency changes,” SIGMOBILE Mob. Comput. Commun. Rev. 18(3), 62–70 (2015). [CrossRef]  

13. B. W. Kim, H. Kim, and S. Jung, “Display field communication: fundamental design and performance analysis,” J. Lightwave Technol. 33(24), 5269–5277 (2015). [CrossRef]  

14. R. Mushu, T. Wada, K. Mukumoto, and H. Okada, “A proposal of information embedding scheme based on discrete cosine transform in parallel transmission visible light communications,” 2018 IEEE 7th Global Conference on Consumer Electronics (GCCE), (2018), pp. 175–176.

15. L. D. Tamang and B. W. Kim, “Exponential data embedding scheme for display to camera communications,” 2020 International Conference on Information and Communication Technology Convergence (ICTC), (2020), pp. 1570–1573.

16. R. C. Gonzalez, “Deep convolutional neural networks [Lecture Notes],” IEEE Signal Process. Mag. 35(6), 79–87 (2018). [CrossRef]  

17. J. Lemley, S. Bazrafkan, and P. Corcoran, “Deep learning for consumer devices and services: pushing the limits for machine learning, artificial intelligence, and computer vision,” IEEE Consumer Electron. Mag. 6(2), 48–56 (2017). [CrossRef]  

18. Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning-based classification of hyperspectral data,” IEEE J. Sel. Top. Appl. Earth Observations Remote Sensing 7(6), 2094–2107 (2014). [CrossRef]  

19. L. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). [CrossRef]  

20. D. Neupane and J. Seok, “Bearing fault detection and diagnosis using case western reserve university dataset with deep learning approaches: A review,” IEEE Access 8, 93155–93178 (2020). [CrossRef]  

21. I. Hussain, J. Zeng, S. Xinhong, and Tan, “A survey on deep convolutional neural networks for image steganography and steganalysis,” KSII Trans. on Int. and Info. Sys. 14(3), 1228–1248 (2020). [CrossRef]  

22. K. Haribabu, G. R. K. S. Subrahmanyam, and D. Mishra, “A robust digital image watermarking technique using auto encoder based convolutional neural networks,” IEEE Workshop on Computational Intelligence: Theories, Applications and Future Directions (WCI), (2015), pp. 1–6.

23. K. Zhang, A. Cuesta-Infante, and K. Veeramachaneni, “SteganoGAN: Pushing the limits of image steganography,” arXiv:1901.03892 (2019).

24. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2016), pp. 770–778.

25. G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2017), pp. 2261–2269.

26. J. Zhu, R. Kaplan, J. Johnson, and L. Fei-Fei, “HiDDeN: Hiding data with deep networks,” ECCV 2018, arXiv:1807.09937 (2018).

27. M. Tancik, B. Mildenhall, and R. Ng, “StegaStamp: Invisible hyperlinks in physical photographs,” CPVR 2020, arXiv:1904.05343v2 (2020).

28. O. Ronneberger, P. Fischer, and T. Brox, “U-Net: Convolutional networks for biomedical image segmentation,” MICCAI 2015, arXiv:1505.04597 (2015).

29. X. J. Mao, C. Shen, and Y. B. Yang, “Image restoration using convolutional auto-encoders with symmetric skip connections,” arXiv:1606.08921 (2016).

30. S. Zheng, S. Chen, and X. Yang, “DeepReceiver: a deep learning-based intelligent receiver for wireless communications in the physical layer,” arXiv:2003.14124 (2020).

31. T. Nguyen, A. Islam, T. Hossan, and Y. M. Jang, “Current status and performance analysis of optical camera communication technologies for 5G networks,” IEEE Access 5, 4574–4594 (2017). [CrossRef]  

32. M. Z. Chowdhury, M. K. Hasan, M. Shahjalal, M. T. Hossan, and Y. M. Jang, “Optical wireless hybrid networks: trends, opportunities, challenges, and research directions,” IEEE Commun. Surv. Tutorials 22(2), 930–966 (2020). [CrossRef]  

33. R. Shin and D. Song “JPEG-resistant adversarial images,” https://machine-learning-and-security.github.io/papers/mlsec17_paper_54.pdf.

34. T.Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, D. Dollár, and C.L. Zitnick, “Microsoft COCO: Common objects in context,” arXiv:1405.0312 (2014).

35. G. Chaladze and L. Kalatozishcili, “Linnaeus 5 dataset for machine learning,” (2017), http://chaladze.com/l5/.

36. D.P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv:1412.6980 (2014).

37. K. Patel, H. Han, A.K. Jain, and G. Ott, “Live face video vs. spoof face video: Use of Moire patterns to detect replay video attacks,” International Conference on Biometrics (ICS), (2015), pp. 98–105.

Cited By

Optica participates in Crossref's Cited-By Linking service. Citing articles from Optica Publishing Group journals and other participating publishers are listed here.

Alert me when this article is cited.


Figures (12)

Fig. 1.
Fig. 1. Schematic overview of proposed D2C model.
Fig. 2.
Fig. 2. Reshaping and upsampling the input data.
Fig. 3.
Fig. 3. Deep D2C-Net architecture: (a) the encoder, and (b) the decoder.
Fig. 4.
Fig. 4. Sequence of noise layers between the encoder and the decoder.
Fig. 5.
Fig. 5. Learning curves of the training model: (a) encoded image loss, and (b) bit accuracy.
Fig. 6.
Fig. 6. Randomly selected cover images from the Linnaeus dataset [35] (top row), data-embedded images of StegaStamp [27] (middle row), and data-embedded images of Deep D2C-Net (bottom row).
Fig. 7.
Fig. 7. Real-world comparison for ADR of Deep D2C-Net with state-of-the-art models.
Fig. 8.
Fig. 8. Example images captured by the camera: (a) at D = 15 cm and θ = 30°, (b) at D = 15 cm and θ = 90°, (c) at D = 15 cm and θ = 120°, and (d) at D = 30 cm and θ = 90°.
Fig. 9.
Fig. 9. Comparison of BERs of Deep D2C-Net and StegaStamp when varying D and θ.
Fig. 10.
Fig. 10. Captured images under different lighting conditions: (a) normal lighting, and (b) ambient lighting with flash.
Fig. 11.
Fig. 11. Comparison of BERs of Deep D2C-Net with that of StegaStamp by varying screen brightness under different lighting conditions.
Fig. 12.
Fig. 12. Comparison of BERs of Deep D2C-Net and StegaStamp by varying the resolution of the camera under different lighting conditions.

Tables (5)

Tables Icon

Table 1. Encoding network details.

Tables Icon

Table 2. Decoding network details.

Tables Icon

Table 3. Network parameters.

Tables Icon

Table 4. Experimental parameters.

Tables Icon

Table 5. PSNR and a real-world BER comparison of the proposed system with state-of-the-art DCNN-based steganography techniques.

Equations (7)

Equations on this page are rendered with MathJax. Learn more.

I ^ E = I E + B L ( G ) + C T + N + C J P E G ,
C T = H o f f s e t + D e + C o f f s e t ,
C o f f s e t = α I E + β ,
L I = 1 H × W × C I o ε ( I o , d ) 2 ,
L D = C E ( δ ( ε ( I o , d ) ) , b ) ,
L T = M i n i m i z e ( L I + L D ) ,
A D R = ( 1 B E R ) × R m a x ,
Select as filters


Select Topics Cancel
© Copyright 2024 | Optica Publishing Group. All rights reserved, including rights for text and data mining and training of artificial technologies or similar technologies.