This paper investigates the performance of the neural network (NN) assisted motion detection (MD) over an indoor optical camera communication (OCC) link. The proposed study is based on the performance evaluation of various NN training algorithms, which provide efficient and reliable MD functionality along with vision, illumination, data communications and sensing in indoor OCC. To evaluate the proposed scheme, we have carried out an experimental investigation of a static indoor downlink OCC link employing a mobile phone front camera as the receiver and an 8 × 8 red, green and blue light-emitting diodes array as the transmitter. In addition to data transmission, MD is achieved using a camera to observe user’s finger movement in the form of centroids via the OCC link. The captured motion is applied to the NN and is evaluated for a number of MD schemes. The results show that, resilient backpropagation based NN offers the fastest convergence with a minimum error of 10−5 within the processing time window of 0.67 s and a success probability of 100 % for MD compared to other algorithms. We demonstrate that, the proposed system with motion offers a bit error rate which is below the forward error correction limit of 3.8 × 10−3, over a transmission distance of 1.17 m.
© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
The optical wireless communications (OWC) technology covering ultraviolet, infrared and visible bands is a complementary technology to the dominated radio frequency (RF) based wireless systems that could be used for addressing the bandwidth bottleneck is a possible option for Internet of things (IoT) [1,2]. The visible spectrum band (i.e., 370–780 nm) known as visible light communications (VLC) is being considered as a possible option in 5th generation (5G) wireless networks for indoor environments. VLC utilizing the light-emitting diodes (LEDs) based lighting fixture offers four independent functionalities of data communications, illumination, localization and sensing in indoor environments [3,4]. In addition, VLC can offer massive MIMO (multiple in multiple output) capabilities using LED and photodetector (PD) arrays for IoT applications in both indoor and outdoor environments . This feature of VLC is unique compared to massive MIMO in RF-based systems, which is too complex to implement.
The wide spread use of smartphones (six billion of them) with high-spec cameras are opening up new possibilities for VLC in applications where the need for high-data rate [6, 7]. Such applications include indoor localization, sensing, intelligent transportation systems, shopping areas, etc. The camera-based VLC, also termed as optical camera communications (OCC), has been studied within the framework of OWC and considered as part of the IEEE 802.15.7rl standard [8, 9]. OCC utilizes the built-in complementary metal-oxide-semiconductor camera in smart devices as the receiver (Rx) for capturing two-dimensional data in the form of image sequences, thus enabling multidimensional data transmission. OCC with multiple functionalities of vision, data communications, localization and motion detection (MD) [8–10] can be used in all-optical IoT (OIoT)  based network application including device-to-device communications, mobile atto-cells, vehicle-to-everything (V2X), smart environments (home, office, surveillance), etc. . In smart environments (i.e., home and offices) OOC-based MD can be utilized to effectively control smart devices [10,12,13]. This is very convenient and cost-effective as users carry smartphones with inbuilt cameras, which can be used as a Rx for both OCC and MD compared with other user interface methods such as gesture control (using a single webcam) and an infrared 3D camera for PC [14,15]. MD based schemes such as (i) Li-Tech - shape detection and 3-D monitoring using visible light sensors ; (ii) Li-Sense - offering data communication and fine-grained, real-time human skeleton reconstruction using VL, which utilizes the shadowing effect and 324 PDs ; and (iii) a number of gesture recognition schemes [14,15] have been proposed.
In OCC, image processing is critical for retrieving the transmitted data from the captured image frames. In recent years, intelligent machine-learning techniques (i.e., neural networks (NN)) have been adopted in image recognition for identifying objects’ shape in the image, transcribing speech into a text, matching classified items and predicting the relevant results from network training . In NN-based feature recognition schemes, multiple hidden layers with artificial neurons are used to train the network. These artificial neurons represent the main constituent, which receives multiple input samples in order to train the NN.
In  first reported by the authors, the initial results of MD performance based on images and centroid data samples (i.e., both considered as the input to NN representing the motion) using the variable learning rate backpropagation algorithm for training. The results in  demonstrate that, NN trained with centroid data samples performs only 5000 iterations in a time window of up to 4 s while the conventional NN trained using images can perform up to 8138 iterations in a time window of up to 9 s. Even though  provides a promising approach for MD, such long time windows could not be applied in real-time cases. Since the time windows have been obtained using a basic backpropagation algorithm, it is necessary to further perform more detailed analyzes of the proposed scheme based on the centroid data samples and using different transfer function-based algorithms for NN in order to reduce the time window and a number of iterations. In this paper, the focus is on the experimental investigation of NN-based MD for OCC performance using a number of transfer function-based training algorithms. In doing so, we include a wide scale of training parameters including the processing time (PT), iterations carried out by NN for MD, the percentage of success for MD and mean squared error (MSE). Unlike conventional NN schemes , the proposed NN-based MD is trained with centroid data samples and different transfer function algorithms, thus providing more accurate detection. In this work, experimental investigations are conducted for an indoor static downlink OCC with a smartphone front camera used as the Rx. The NN training is performed using eight different transfer function-based training algorithms for MD over the transmission distance L of up to 2 m. The OCC link quality in terms of the bit error rate (BER) and peak signal-to-noise ratio performance (PSNR) with respect to L is also analyzed simultaneously. The proposed NN-based MD can be used for control of data communications in OIoT networks.
The rest of the paper is structured as follows: Section 2 provides details of the proposed NN based MD in OCC. Experiment results are discussed in Section 3. Conclusions are drawn in Section 4.
2. Proposed NN based MD in OCC
2.1. System overview
Figure 1(a) illustrates the system overview of the proposed OCC-based NN assisted MD in an indoor environment. A data packet generation output, which is in a 12.8 kbits non-return-to-zero (NRZ) on-off keying (OOK) format, is first mapped according to the addresses of 8 × 8 red, green and blue (RGB) Neo pixel LED array using an Arduino Uno board (an open source microcontroller board based on the ATmega328 ). The intensity modulated (IM) light signal is transmitted over the free space channel. On the Rx side, an Android smartphone’s front camera with the frame rate of 30 frames per second (fps) and a resolution of 1920 × 1080 pixels is used to capture the images (i.e., a video stream) of the IM LED array. In this work, the mobile phone is assumed to be located in a static position directly beneath the LED transmitter (Tx) at a height of 20 to 200 cm.
Figure 1(b) shows the flowchart of MD and communication analysis on the Rx side. Note, motion is achieved by the user’s finger moving over the camera. Both the RGB LEDs and finger movement are simultaneously captured by the camera in the form of a video stream, which is then divided into frames for post-image processing using MATLAB. Typically, the recorded video length depends on the motion duration Δt with the mean and maximum values of ∼ 2.5 s and 4.5 s, respectively. For a camera, with a frame rate of 30 fps the captured frames of 75 and 135 are for Δt of 2.5 s and 4.5 s, respectively. As shown in Fig. 1(a), the user’s finger hovering a few centimeters above the camera’s screen will result in shadowing and reflected light rays. Note that, the illuminated finger is readily traceable by the camera using a tracking function, and its motion is expressed as centroids, which represent the center of a moving finger in the form of consecutive coordinate points , see Fig. 2. Note, in Fig. 2, each coordinate point represents the center of a moving finger tracked in a particular time frame. The key principle of MD is to compare the changes between the frames (a series of images) following video processing. The frame resolution is measured in terms of the pixels and inter-frame time, which is 33.3 ms (1 s/30 fps). The motion between two consecutive frames can be simply determined as the difference between the centroid coordinates (x2 − x1, y2 − y1) in (N + 1)th and Nth frame. The coordinate position of motion centroid (MC), which is obtained from the user’s finger movement, is applied to a pre-trained NN system for detection and identification of user’s motions. For the demonstration purpose, we consider five motion patterns, which are created from two simple natural motions in a straight, circular and curvature lines. These motions can be used to control smart devices, e.g., straight and circular motions can be used for turning ON and OFF of a device.
2.2. Data compensation scheme
We have adopted a transmit data compensation scheme based on the anchor LEDs (four per frame) and a synchronization LED for time synchronization, which is located in the first frame as in , in order to overcome blocking or shadowing due to mobility as depicted in Fig. 3(a). The data compensation scheme is based on discarding damaged frames due to the blocking of the anchor LEDs and requesting re-transmission. Note that, obstacles may fully/partially block one or more anchor LEDs, thus resulting in damaged frames, see Fig. 3(b), which will lead to increased BERs. The use of anchor LEDs (i.e., four-bit per frame in this case) results in reduced data throughput per frame, thus the trade-off between the BER and the data throughput.
For the proposed scheme with the transmit data compensation scheme, the data rate can be given as:
For the proposed OCC-based scheme, we have adopted an efficient detection scheme of differential detection threshold (DDT) [10,21]. In the DDT scheme, the threshold level is defined in terms of the quantized intensity level within the range of [0–255]. Figure 4(a) represents the identified data area within the frame, while Fig. 4(b) provides the quantized intensity of the detected data. Based on DDT the initial value of threshold level was set to 181 level of quantized intensity as in [10, 21]. Note, the threshold level can be adaptively set based on the intensity levels in the image frame.
2.3. NN-based MD for OCC
The proposed scheme can be trained using the transfer function algorithms in order to improve the MD performance by identifying only the predefined motions. Figure 5 illustrates the NN structure for MD performance evaluation within the context of OCC.
The input nodes are the coordinate positions of 100 centroid data samples, which represent 20 centroid data samples per predefined motions (i.e., variants of linear, circular and curvature movements) for an OCC link span ranging from 20 cm to 200 cm. There are two hidden layers of 100 and 5 neurons, respectively. The hidden layers are used to detect and identify the user’s motion, the output of which is expressed in the form of five-bit training labels representing the five predefined motions.
Note, for the training of NN, we have used eight possible transfer function-based algorithms as listed in Table 1. When selecting the most suitable training algorithm a number of factors needs considering including the number of neurons Nn in the hidden layers, PT, error measurement and the type of network used for pattern recognition, etc. . In this work, we train the NN with MC and use pattern recognition to identify the classification of input signals or patterns in order to evaluate the link performance.
The key system and NN training parameters are given in Table 2. The training parameters of training goal, iterations and time were set to the same values for all training algorithms in order to evaluate their performance under the same training environment.
Figures 6(a)–6(c) shows the experimental results of the detected MC representing variants of linear and circular motions as well as the curvatures. The solid grey line represents the actual considered motions while the dots represent detected MC tracked from user’s finger movement over the smartphone’s front camera when receiving data from the Tx. The coordinate points of these MC are used further to determine the probability of success for MD. Note, due to tracking some centroids are deviated from the actual motion path (highlighted in small blue circles) while some part of other light sources (highlighted in small red boxes) are captured within the surrounding. However, the NN training output is not affected due to these small number of deviated centroids and other light sources.
To evaluate the system performance, we have used two criteria of MSE and the PT for all transfer function algorithms, which are obtained by averaging over 1000 training iterations for the OCC link span ranging from 20 – 200 cm, as depicted in Fig. 7(a). As mentioned in Table 2, the training time limit was set to infinite in order to examine properly all the training algorithms, considering that some will take longer time to converge with the predicted accurate output. Note that, in a real-world application using NN with infinite networks a time complexity approach can be considered based on Markov Chain Monte Carlo method, which is compatible with large networks . As shown in Fig. 7(a), the RP algorithm converges faster than others reaching the minimum MSE and PT of 5.1 × 10−5 and 0.67 s, respectively. The conjugate gradient algorithms (SCG, CGB, CGF and CGP) also perform well and can be used in networks with a large number of neuron weights due to the modest memory requirements . Note, the LM algorithm offers the worst performance in terms of both the PT and MSE. This is because LM is designed for the least square problems, which are approximately linear in contrast to pattern recognition problems where the output neurons are generally saturated . Both GDX and OSS algorithms converge rapidly provided the training is stopped too soon, but at the cost of inconsistent results . Figure 7(b) illustrates the percentage of success for MD performed over a total of 100 experiments with respect to L for all algorithms listed in Table 1. The percentage of success for MD was determined based on the comparison of the exact input with five-bit output of NN, which represents the five predefined motions. It can be seen that RP display the best performance with the MD accuracies of 100 and 96.5 % over a link spans of 1.6 and 2 m (i.e., the maximum range in this work). The reduction in accuracy for increasing L is due to the fact that the illumination level of finger becomes lower as it moves away from the Tx. However, this does not have a significant impact on NN training and therefore, these reduced accuracy levels are still acceptable.
With RP displaying the best performance, we have further investigated it’s complexity of NN in terms of MSE and PT. Note, in general, Nn in the hidden layers can be larger or smaller than the number of input nodes (i.e., data samples). Large or small Nn will result in a complex NN and a higher number of training iterations and PT, respectively. Thus, the trade-off between Nn and NN training complexity is illustrated in Table 3.
Since the proposed scheme offers simultaneous indoor data transmission via OCC and MD, next, we evaluated the link’s BER and PSNR performances. Since in OCC the data is captured in the form of a two-dimensional image, a conventional SNR measurement cannot fully reflect the quality of the link. Therefore, we have adopted PSNR, which is widely used as a quality metric in image processing systems, as given by :
Note that, user’s motion will result in partial shadowing, which will ultimately affect the BER performance. Figure 8 shows the link’s BER and PSNR performance against L for 12.8 kbits of data and four-bit header at a Rd of 1.199 kbps, where error-free data transmission is achieved at L up to 80 cm. Note, at the forward error correction (FEC) limit of 3.8 × 10−3 at L of 1.17 m, which is achieved because of the data compensation scheme. The transmission span of 1.17 m is a typical range in environments such as hospital wards, etc. Figure 8 depicts the BER performance as a function of PSNR for the proposed link. At a BER of 10−5, well below the FEC limit of 3.8 × 10−3, the PSNR is ∼ 20 dB.
Finally, we compared the performance of the proposed NN assisted MD in OCC systems with MoC , TNMD , VLC-based MD  and LiSense  as shown in Table 4. In MoC, TNMD and the proposed NN assisted MD in OCC systems Android smartphone front camera has been used as Rx whereas, in VLC based MD and LiSense use PD-based Rx. Note, NN-based schemes offer improved performance compared to VLC-based systems. The highest percentage of success for MD of 96 % at L up to 200 cm is observed for the proposed scheme with the RP algorithm (with measured PSNR of 16.18 dB). The same percentage of success for MD is achieved for MoC a complex but a reliable Quadrant division based MD algorithm, but at L of 12 cm. Higher percentage of success for MD of 97 % is observed for TNMD with the basic NN algorithm at a maximum L of 125 cm. The improvement offered by the proposed NN assisted MD in OCC link, which uses a mobile phone camera as the Rx for MD and data transmission, is due to the use of RP algorithm within NN.
The performance of NN assisted MD OCC link was experimentally evaluated for eight different transfer function based training algorithms with training parameters of PT, the number of iterations, the percentage of success for MD and MSE. We showed that, the best performance was achieved using the RP algorithm with the fastest convergence at a minimum error (MSE) and a PT of 10−5 and 0.67 s, respectively as well as the percentage of success for MD of 100 % up to a 1.6 m OCC link. For higher L, the OCC link will experience shadowing due to fingers’ movement thus the need for diversity based Rx. We also demonstrated that, using the transmit data compensation scheme a high-quality data transmission with the FEC limit 3.8 × 10−3, was achieved at 1.17 m OCC link. The reliability and efficiency of the proposed scheme were assessed by comparing it with other existing techniques. The NN for MD analysis can be further extended to increase the link spans based on pattern recognition algorithms and using different transmitter configurations for mobility and multiuser indoor smart home environments. On the other hand, the date rate can be enhanced using a high capture speed camera with rolling shutter and a larger LED array as the Rx and the Tx, respectively in a MIMO OCC link.
H2020 Marie Sklodowska-Curie Innovative Training Network (VisIoN 764461).
1. P. H. Pathak, X. Feng, P. Hu, and P. Mohapatra, “Visible light communication, networking, and sensing: A survey, potential and challenges,” IEEE Commun. Surv. Tutorials 17, 2047–2077 (2015). [CrossRef]
2. Z. Ghassemlooy, S. Zvanovec, M.-A. Khalighi, W. O. Popoola, and J. Perez, “Optical wireless communication systems,” Optik 151, 1–6 (2017). [CrossRef]
3. S. Zvanovec, P. Chvojka, P. A. Haigh, and Z. Ghassemlooy, “Visible light communications towards 5G,” Radioengineering 24, 1–9 (2015). [CrossRef]
4. Z. Ghassemlooy, W. Popoola, and S. Rajbhandari, Optical wireless communications: system and channel modelling with Matlab (CRC press, 2019). [CrossRef]
5. S. R. Teli, S. Zvanovec, and Z. Ghassemlooy, “Optical internet of things within 5G: Applications and challenges,” in 2018 IEEE International Conference on Internet of Things and Intelligence System (IOTAIS), (IEEE, 2018), pp. 40–45. [CrossRef]
6. R. Boubezari, H. Le Minh, Z. Ghassemlooy, and A. Bouridane, “Smartphone camera based visible light communication,” J. Light. Technol. 34, 4121–4127 (2016). [CrossRef]
7. T. Nguyen, A. Islam, T. Hossan, and Y. M. Jang, “Current status and performance analysis of optical camera communication technologies for 5G networks,” IEEE Access 5, 4574–4594 (2017). [CrossRef]
8. I. Takai, S. Ito, K. Yasutomi, K. Kagawa, M. Andoh, and S. Kawahito, “LED and CMOS image sensor based optical wireless communication system for automotive applications,” IEEE Photonics J. 5, 6801418 (2013). [CrossRef]
9. M. J. Jang, “IEEE 802.15 WPAN 15.7 amendment-optical camera communications study group (SG 7a),” (2019 [Online accessed 6 March 2019]).
10. S. Teli, W. A. Cahyadi, and Y. H. Chung, “Optical camera communication: Motion over camera,” IEEE Commun. Mag. 55, 156–162 (2017). [CrossRef]
11. Z. Ghassemlooy, L. N. Alves, S. Zvanovec, and M.-A. Khalighi, Visible light communications: theory and applications (CRC press, 2017). [CrossRef]
12. S. R. Teli, W. A. Cahyadi, and Y. H. Chung, “Trained neurons-based motion detection in optical camera communications,” Opt. Eng. 57, 1–4 (2018). [CrossRef]
14. N. Lalithamani, “Gesture control using single camera for PC,” Procedia Comput. Sci. 78, 146–152 (2016). [CrossRef]
15. D. Ionescu, V. Suse, C. Gadea, B. Solomon, B. Ionescu, and S. Islam, “A new infrared 3D camera for gesture control,” in 2013 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), (IEEE, 2013), pp. 629–634. [CrossRef]
16. E. Alizadeh Jarchlo, X. Tang, H. Doroud, V. P. G. Jimenez, B. Lin, P. Casari, and Z. Ghassemlooy, “Li-Tect: 3-D monitoring and shape detection using visible light sensors,” IEEE Sensors J. 19, 940–949 (2019). [CrossRef]
17. T. Li, C. An, Z. Tian, A. T. Campbell, and X. Zhou, “Human sensing using visible light communication,” in Proceedings of the 21st Annual International Conference on Mobile Computing and Networking, (ACM, 2015), MobiCom ’15, pp. 331–344.
19. Atmel datasheet, “8-Bit microcontroller with 4/8/16/32K bytes in-system programmable flash,” (Atmel Corporation, 2009).
20. J. C. Nascimento, A. J. Abrantes, and J. S. Marques, “An algorithm for centroid-based tracking of moving objects,” in 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258), vol. 6 (IEEE, 1999), pp. 3305–3308. [CrossRef]
21. Y. H. Kim and Y. H. Chung, “Experimental outdoor visible light data communication system using differential decision threshold with optical and color filters,” Opt. Eng. 54, 1–3 (2015). [CrossRef]
22. Mathworks, “Choose a multilayer neural network training function,” https://in.mathworks.com/help/deeplearning/ug/choose-a-multilayer-neural-network-training-function.html;jsessionid=2e10b001e2c1c97fcac09f6004e6.
23. C. K. I. Williams, “Computation with infinite neural networks,” Neural Comput. 10, 1203–1216 (1998). [CrossRef]
24. Q. Huynh-Thu and M. Ghanbari, “Scope of validity of PSNR in image/video quality assessment,” Electron. Lett. 44, 800–801 (2008). [CrossRef]