Influenced by severe ambient noises and nonstationary disturbance signals, multi-class event classification is an enormous challenge in several long-haul application fields of distributed vibration sensing technology (DVS), including perimeter security, railway safety monitoring, pipeline surveillance, etc. In this paper, a deep dual path network is introduced into solving this problem with high learning capacity. The spatial time-frequency spectrum datasets are built by utilizing the multidimensional information of DVS signal, especially the spatial domain information. With the novel datasets and a high-parameter-efficiency network, the proposed scheme presents good reliability and robustness. The feasibility is verified in an actual railway safety monitoring field test, as a proof-of-concept. Seven types of real-life disturbances were implemented and their f1-scores all reached up to 97% in the test. The performance of this proposed approach is fully evaluated and discussed. The presented approach can be employed to improve the performance of DVS in actual applications.
© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
With the rapid development of distributed vibration sensing (DVS) [1–7], DVS has been applied into many fields, including perimeter security , railway safety monitoring , pipeline surveillance [10,11], geophysical prospecting [12,13], etc. To identify harmful disturbances, some event detection methods are developed [14–16]. However, the simple binary classification detection cannot meet the actual needs of applications. Different types of disturbances usually need different disposal plans. Alarm management and disposal will become extremely difficult with unknown disturbance types, especially for long-haul applications. Therefore, it is vitally essential to develop an appropriate multi-class event classification method, identifying the type of each disturbance. However, severe ambient noises and nonstationary disturbance signals make multi-class event classification extremely difficult in practical applications, for instance, the unpredictable interference sources at different fiber locations, the time-varying fiber burying environments , etc.
Some pattern recognition methods are recently proposed to achieve multi-class event classification [10,17–19]. These methods are realized by conventional classifiers, including support vector machine, relevance vector machine, Gaussian mixture model, etc. The satisfying results of these conventional machine learning methods are usually obtained with a few experimental events, but their robustness and effectiveness cannot meet the demands of practical applications. A well-trained recognition model have to be retrained or tuned, when applied into new surroundings or a new DVS equipment. This debug processing is usually time-consuming and exceedingly difficult. In 2016, A. V. Makarenko introduced the deep learning (DL) algorithm to classify events in long perimeter monitoring . In 2017, Metin Aktas et al utilized a five-layer network to achieve multi-threat classification and the datasets were built from short-time-Fourier-transform coefficients of direct-detection phase-sensitive optical time domain reflectometry (Ф-OTDR) . The results are satisfying and these methods give representative examples of DL [22,23]. However, these deep networks are relative shallow and the classification performance can be improved with deeper networks, as the network depth plays an importance role in DL . Meanwhile, only simple features at special fiber location are utilized in their datasets, including time domain, frequency domain, and time-frequency domain. While the disturbance range is not a single point, but an area along the fiber . The morphology of disturbances can be reflected intuitively in space domain, and thus it is beneficial to include the space domain information into the dataset. Therefore, there is still room for substantial improvement over these conventional DL methods.
In this paper, we introduce a deeper neural network with nearly one hundred layers and develop a multi-dimension DVS dataset to reliably and robustly classify multi-class events in complicated practical applications. The deeper neural network is dual path network (DPN), developed by Chen Yunpeng  et al. The highly efficient DPN is one of the remarkable methods in the DL field [25,26], can extract features effectively, and provides high reliability. The DVS dataset samples are constructed from the raw data with integrated time-frequency domain features and space domain features. The multidimensional dataset and high efficient DPN are expected to provide strong robustness. As the proof of concept, seven types of disturbances are implemented along the railway and the existed optical cable is utilized as the sensing fiber. The time-varying signal waves of disturbances are well considered in the real-life test data. The high reliability and strong robustness are verified; the influence of some key factors are discussed. The proposed scheme provides a reliable and practicable multi-class event classification solution for complicated actual applications.
2.1. Deep learning and dual path network
Deep learning (DL) is a modern machine-learning technology with few manual setting hyper-parameters and provides a convenient method for pattern recognition, speech process, etc. Due to the unique feature extraction ability, DL can process natural data in their raw form . It is verified that the feature-extraction ability is mainly attributed to the power of network depth . Generally, a deeper network can get a better classification performance, including the generalization ability and recognition accuracy. The generalization ability is the adaptive capability of a trained network to fresh samples and affects the robustness of the whole system. However, deep networks are difficult to be trained and converge slowly with plenty of trainable weight parameters.
Dual path network (DPN) is a deep network with tens or hundreds of network layers. DPN combines the deep residual network  and dense convolutional Network , inherits their advantages of effective feature re-usage and re-exploitation. DPN is one of outstanding deep neural networks, and it helped the NUS-Qihoo-DPNs team win multiple championships in the famous ImageNet Large Scale Visual Recognition Challenge (ILSVRC 2017) . The dual path architecture of DPN is illustrated in Fig. 1(a), including the residual path (RP) and the densely connected path (DCP). Each block in the RP is based on the residual function, which adds the input features to the output of each block. The residual between input and output is learned by each block, the training is eased, and the network is easier to optimize. In the DCP, the input features are concatenated to the output of each block. The increment of the network width is determined by a hyper-parameter, termed as the width increment. Some new raw features, in the input but different from the output, can be exploited in the block. Meanwhile, to improve the performance, some homogeneous branches are added in each block and the branch number is termed as the cardinality , an important factor different from depth and width. Their blocks are shown in Figs. 1(b) and 1(c), respectively. With shared connections for the two paths, DPN has low model complexity. Although DPN is deeper, its trainable weight parameter number is relative small and the parameter efficiency is high with the shared weight parameters and the specific structure. In this paper, the representative DPN92 with 92 layers is chosen and the hyper-parameters are chosen as suggested [25,30], including the cardinality and the width increment of DCP, weighing the network size and performance.
2.2. Dataset construction
In complicated practical applications, ambient noise is severe and disturbances are usually time-varying. Single frequency feature and time-frequency feature are nonstationary to some extent. The multidimensional DVS information is expected to improve the performance. As stated in the Introduction part, the space domain information must be included in the datasets. Meanwhile, the time-frequency domain information should be also incorporated, reflecting the wave features of disturbance signals. Thus, the spatial time-frequency spectrum datasets is proposed by utilizing the DVS multidimensional information, wishing for high reliability and robustness.
The spatial time-frequency dataset model is explained as follows. The demodulated signal of DVS is expressed as. is the fiber location and is the time sequence of probe pulses. Take IQ-demodulating Ф-OTDR as an example for convenience, the demodulated signal is expressed as [9,31],4,5], and the spectrum location scale is limited by the spatial interval of adjacent events and the event spatial scale. A spectrum location scale smaller than the event spatial scale cannot accurately reflect the disturbance morphology, while one much larger than the event spatial interval will cause confusion of multiple events.
The spatial time-frequency spectrum is normalized according to the intensity of the signal in the zero frequency, illustrated in Fig. 2(a). The x and y dimensions are location and frequency, in kilometer and Hertz, respectively. The location scale and frequency scale are fixed. After being converted into a colorful plane graph using the standard “jet” color map, the spatial time-frequency spectrum is transferred into a three-channel red-green-blue (RGB) image and the visibility is enhanced. In the image, the x label and y label are ignored for generalization ability. The three RGB channels are individually shown in Fig. 2(b). Finally, the RGB image is resized to pixels, as a sample for DPN. Samples of disturbances of interest are organized for datasets. Note that, after the sample is transformed into an image, the rest work becomes image identification and thus the well-developed image identification methods can be used.
3. Field test results and analysis
3.1. Field test and system setup
As a proof of concept, a field test is carried out for railway safety monitoring and a DVS system is applied to detect vibrations. Due to severe ambient noises, uncontrollable disturbance types and non-repetitive construction conditions, disturbances along the long-haul railway are numerous, multifarious and time-varying. The railway safety monitoring is a dauntingly complicated application of DVS and it is suitable for verifying the validity of the proposed scheme. The field test arrangement is shown in Fig. 3. The sensing fiber is chosen from the existed communication cable, which is buried underground about 0.3~0.8 m. Several types of human disturbances are implemented, including concrete fence breaking, pedestrians, tamping operation, excavator operation, and moving train. Considering the instability of disturbance signals, we carried out different implementations for the same types of disturbances, and the data is collected from three separate railway lines. The soil conditions, relative position between operation spot and optical cable, fiber location, and disturbance intensity are also varying in the field test.
The DVS device and its system schematic are shown in Fig. 4. A conventional IQ-demodulating Ф-OTDR  is chosen as a DVS example for convenience. The AOM frequency shift is 160 MHz, pulse width is 100 ns, and the pulse repetition rate is 100 Hz. An IQ demodulator (IQD) and two LPFs are introduced to demodulate the IQ signals in Eq. (1). The LPF cut-off frequency is 23 MHz and DPU sampling rate is 100 MSa/s. The components of gray area are assembled into a DVS device, as shown in Fig. 4.
3.2. Disturbance datasets, network training and testing
The demodulated signals of disturbances are converted into RGB samples together with those of local wind and ambient noise, as illustrated in datasets construction section. Considering multidimensional features of these disturbances, the location scale and frequency scale are 1 km and 50 Hz, respectively. Typical RGB images of disturbances are shown in Fig. 5. Some images have narrow spatial distribution, like Figs. 5(b) and 5(f), while some are with broad one, including Figs. 5(a) and 5(d). The width of Figs. 5(c) and 5(f) is somewhere in between. The spectrum morphology of excavator operation seems like an acute triangle, and the moving train has various spectrums, changing with the relative location of the moving train in the sampled region, such as, right triangle, right-angled trapezoid, or polygon. Compared with Fig. 5(b), Fig. 5(e) appears a decay with frequency. The ambient noise has randomly distributed spots and randomly-occurring single-frequency noise. In short, the image characteristics of these disturbances are apparent in morphology and the proposed dataset construction method should be effective. The sample count of each disturbance is 1000 and the total sample count is 7000 in the datasets with 7 types of disturbances. The datasets are constructed from multiple tests on different dates, and the datasets are randomly divided into two parts with uniformly distributed types. One part is for network training, termed as training datasets (5600 samples); the other part is for network testing, termed as testing datasets (1400 samples).
In experiments, two deep convolutional neural networks (CNNs) are also introduced to verify the performance of DPN92, including CNN5 and AlexNet. AlexNet  is an important milestone in DL, where some tricks were firstly applied in CNNs, including rectified linear unit (ReLU), Dropout, local response normalization (LRN), etc. CNN5 is a typical 5-layer CNN with ReLu as activation function in each convolutional layer, composed of three convolutional layers and two fully connected layers. Their detail network structures are listed in Table 1. The above datasets are fed into the three networks for network training and testing. The performance of three networks are evaluated and compared in field tests.
In the network training, samples of training dataset are fed into the three networks and the network weight parameters are optimized continuously. The cross entropy between the network output and the sample label is evaluated, termed as loss value. The accuracy can be also calculated. Before fresh samples are fed, loss values are utilized to update the network weight parameters by gradient back-propagation and the gradient descend optimizer. The above process is termed as an iteration. The sample count of one iteration is the batch size and the batch size is scaled to 25 in this research, limited by the memory (about 26 GB occupied with DPN92). The accuracy and loss value are plotted during the network training, shown in Fig. 6. The obvious accuracy improvement and loss value decline are observed with all the three networks, which means the dataset construction method is effective. It can be observed that DPN92 has the best convergence rate among the three networks, with the most network layers. Although the network is deeper than CNN5 and AlexNet, the high parameter efficiency of DPN can ensure its high convergence rate. Thus, the strong learning performance of DPN is verified.
For network test, test samples are fed into the trained network to evaluate the generalization ability. As stated above, the generalization ability is the adaptive capability of a trained network to fresh samples. A network with strong generalization ability can accurately identify not only for training samples, but also for test samples. Therefore, we study the accuracy and loss value for test datasets to evaluate the generalization ability of the trained network. To observe the generalization ability dynamically, we feed the testing samples into each trained network after each iteration in the training process. Without the gradient descend optimizer, network testing occupies less memory than training, and thus a larger batch size (100) is utilized to obtain reliable mean accuracy. The accuracy variation and loss variation are shown in Fig. 7. As the training continues, the accuracy of test datasets increases rapidly. It is obvious that DPN92 has the best accuracy, AlexNet is the second, and CNN5 is the lowest. The result is consistent with the conclusion that a deeper network will have a higher classification performance. The accuracy of the proposed scheme reaches 0.97 (97%) after 1200 iterations. The accuracy is still high for fresh samples that the network has never seen during training. The generalization ability is strong and the robustness of the proposed multi-class event classification scheme is verified.
To further analysis the comprehensive performance of our proposed method and the trained network model, the confusion matrix is obtained as in Fig. 8. Each type of disturbances can be identified correctly, and only disturbances of type 1, 2, and 4 are with small probability misclassification. It reveals the classification challenge of severe ambient noise and time-varying disturbances. As shown in the table, few hazardous events were mistakenly identified as ambient noise, which ensures fine false alarm rate. Meanwhile, precision, recall and f1-score of each type of disturbances are calculated and shown in Table 2. Precision is the proportion of true positive samples to positive samples; Recall is the proportion of true positive samples to true samples; f1-score is the harmonic average of precision and recall, which is usually as the final evaluation index. The f1-scores are all superior to 0.97 and the effectiveness of the proposed method is further confirmed.
4.1. Batch size and convergence rate
In the network training, the batch size has an important influence on network convergence rate. Large batch can make full use of the system computational power and speed up the training. However, there is a generalization problem for large-batch training . To study the influence of the batch size, we carry out some experiments. In some ways, the batch size is relative to the datasets size; increasing the batch size and decreasing the datasets size have equivalent influence on the convergence rate. Limited by the experimental conditions, we scale the batch size to 25 and vary the datasets size. The dataset sizes are 1400 and 5600. The accuracy variations of training datasets are shown in Fig. 9. The accuracy of small datasets achieved stability after about 90 iterations, while the large datasets needed more than 250 iterations.
4.2. Elapsed time of network training
The training of deep networks is usually time-consuming and an appropriate computing platform can speed up the training. Therefore, we studied the elapsed times for DPN training on different platforms. Three platforms are chosen and the corresponding platform configurations are listed in Table 3. The proposed DVS dataset is fed into the DPN. The batch size is 1 with limited memory and the result is shown in Fig. 10. The elapsed time on graphics-process-unit (GPU) platforms is far less than that on central-processing-unit (CPU) platform. The GPU-based parallel computation can speed up training greatly. Meanwhile, higher kernel frequency and lager memory can speed up network training further. With a large datasets, the network training may take several days to achieve feasible accuracy and generalization ability. It is worth noting that the network prediction needs no gradient descend optimizer and it is easy to achieve real-time online event classification with appropriate GPUs.
In conclusion, a multi-class event classification scheme is demonstrated for complicated applications with nonstationary signals and severe noises. The conventional DVS is applied to detect signals and a deep DPN is introduced to classify events. The multidimensional information of DVS disturbance signals is simultaneously utilized to construct datasets with simple preprocessing, providing high robustness. DPN has high parameter efficiency and strong learning ability, providing high accuracy. Therefore, the proposed scheme is reliable, practicable and robust for complicated actual applications. The real-life railway monitoring field test is carried out as a proof of concept and the trained network is fully evaluated with test dataset. The effectiveness and robustness are experimentally verified. The batch size and elapsed time are discussed. The proposed scheme is expected to be promising for complicated distributed safety monitoring applications and to expand the application range of DVS.
National Key R&D Program of China (2017YFB0405501); National Natural Science Foundation of China (61675216); Natural Science Foundation of Shanghai (18ZR1444600); Science and Technology Commission of Shanghai Municipality (18DZ1201303).
The authors thank Shanghai Railway Bureau and the Youth Innovation Promotion Association of Chinese Academy of Science for their support.
1. H. F. Taylor and C. E. Lee, “Apparatus and method for fiber optic intrusion sensing,” U.S. Patent, 5194847, (1993).
2. Z. Wang, J. Zeng, J. Li, F. Peng, L. Zhang, Y. Zhou, H. Wu, and Y. Rao, “175km Phase-sensitive OTDR with Hybrid Distributed Amplification,” Proc. SPIE 9157, 9157D5 (2014).
3. Z. Wang, Z. Pan, Z. Fang, Q. Ye, B. Lu, H. Cai, and R. Qu, “Ultra-broadband phase-sensitive optical time-domain reflectometry with a temporally sequenced multi-frequency source,” Opt. Lett. 40(22), 5192–5195 (2015). [CrossRef] [PubMed]
4. Q. Liu, X. Fan, and Z. He, “Time-gated digital optical frequency domain reflectometry with 1.6-m spatial resolution over entire 110-km range,” Opt. Express 23(20), 25988–25995 (2015). [CrossRef] [PubMed]
5. D. Iida, K. Toge, and T. Manabe, “Distributed measurement of acoustic vibration location with frequency multiplexed phase-OTDR,” Opt. Fiber Technol. 36, 19–25 (2017). [CrossRef]
6. J. Zhang, T. Zhu, H. Zheng, K. Yang, M. Liu, and H. Wei, “Breaking through the bandwidth barrier in distributed fiber vibration sensing by sub-Nyquist randomized sampling,” Proc. SPIE 10323, 103238H (2017).
7. Z. Pan, K. Liang, Q. Ye, H. Cai, R. Qu, and Z. Fang, “Phase-sensitive OTDR system based on digital coherent detection,” Proc. SPIE 8311, 83110S (2011).
9. Z. Wang, B. Lu, H. Zheng, Q. Ye, Z. Pan, H. Cai, R. Qu, Z. Fang, and H. Zhao, “Novel railway-subgrade vibration monitoring technology using phase-sensitive OTDR,” Proc. SPIE 10323, 103237G (2017).
10. J. Tejedor, J. Macias-Guarasa, H. F. Martins, S. Martin-Lopez, and M. Gonzalez-Herraez, “A Gaussian Mixture Model-Hidden Markov Model (GMM-HMM)-based fiber optic surveillance system for pipeline integrity threat detection,” in 26th International Conference on Optical Fiber Sensors (Optical Society of America, 2018), paper WF36. [CrossRef]
11. J. Chen, H. Wu, X. Liu, Y. Xiao, M. Wang, M. Yang, and Y. Rao, “A Real-Time Distributed Deep Learning Approach for Intelligent Event Recognition in Long Distance Pipeline Monitoring with DOFS,” in International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (IEEE, 2018), pp. 290–2906. [CrossRef]
12. P. Jousset, T. Reinsch, T. Ryberg, H. Blanck, A. Clarke, R. Aghayev, G. P. Hersir, J. Henninges, M. Weber, and C. M. Krawczyk, “Dynamic strain determination using fibre-optic cables allows imaging of seismological and structural features,” Nat. Commun. 9(1), 2509 (2018). [CrossRef] [PubMed]
13. S. Dou, N. Lindsey, A. M. Wagner, T. M. Daley, B. Freifeld, M. Robertson, J. Peterson, C. Ulrich, E. R. Martin, and J. B. Ajo-Franklin, “Distributed Acoustic Sensing for Seismic Monitoring of The Near Surface: A Traffic-Noise Interferometry Case Study,” Sci. Rep. 7(1), 11620 (2017). [CrossRef] [PubMed]
15. H. Wu, Z. Wang, F. Peng, Z. Peng, X. Li, Y. Wu, and Y. Rao, “Field test of a fully-distributed fiber-optic intrusion detection system for long-distance security monitoring of national borderline,” Proc. SPIE 9157, 915790 (2014). [CrossRef]
16. J. Tejedor, J. M. Guarasa, H. F. Martins, P. G. Juan, M. L. Sonia, P. C. Guillen, D. P. Guy, D. S. Filip, P. Willy, C. H. Ahlen, and G. H. Miguel, “Real Field Deployment of a Smart Fiber-Optic Surveillance System for Pipeline Integrity Threat Detection: Architectural Issues and Blind Field Test Results,” J. Lightwave Technol. 36(4), 1052–1062 (2018). [CrossRef]
17. J. Tejedor, J. Macias-Guarasa, H. F. Martins, D. Piote, J. Pastor-Graells, S. Martin-Lopez, P. Corredera, and M. Gonzalez-Herraez, “A Novel Fiber Optic Based Surveillance System for Prevention of Pipeline Integrity Threats,” Sensors (Basel) 17(2), 355–373 (2017). [CrossRef] [PubMed]
18. Q. Sun, H. Feng, X. Yan, and Z. Zeng, “Recognition of a Phase-Sensitivity OTDR Sensing System Based on Morphologic Feature Extraction,” Sensors (Basel) 15(7), 15179–15197 (2015). [CrossRef] [PubMed]
19. D. Tan, X. Tian, W. Sun, Y. Zhou, L. Liu, Y. Ma, J. Meng, and H. Zhang, “An Oil & Gas Pipeline Pre-warning System Based on Φ-OTDR,” Proc. SPIE 9157, 91578W (2014). [CrossRef]
20. A. V. Makarenko, “Deep Learning Algorithms for Signal Recognition in Long Perimeter Monitoring Distributed Fiber Optic Sensors,” in IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP) (IEEE, 2016), pp. 1–6. [CrossRef]
21. M. Aktas, T. Akgun, M. U. Demircin, and D. Buyukaydin, “Deep Learning Based Multi-threat Classication for Phase-OTDR Fiber Optic Distributed Acoustic Sensing Applications,” Proc. SPIE 10208, 102080G (2017).
23. D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel, and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature 529(7587), 484–489 (2016). [CrossRef] [PubMed]
24. E. Ronen and O. Shamir, “The Power of Depth for Feedforward Neural Networks,” arXiv:1512.03965 (2015).
25. Y. Chen, J. Li, H. Xiao, X. Jin, S. Yan, and J. Feng, “Dual Path Networks,” in 31st Conference on Neural Information Process Systems (NIPS, 2017).
26. ImageNet Large Scale Visual Recognition Challenge (LSVRC) [online], available at http://image-net.org/challenges/LSVRC/2017/results (2017).
27. K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2016), pp. 770–778.
28. G. Huang, Z. Liu, L. Maaten, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2017), pp. 2261–2269.
29. S. Xie, R. Girshick, P. Doll’ar, Z. Tu, and K. He, “Aggregated Residual Transformations for Deep Neural Networks,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2017), pp. 5987–5995. [CrossRef]
30. Z. Wang, L. Li, H. Zheng, J. Liang, X. Wang, B. Lu, Q. Ye, H. Cai, and R. Qu, “Smart Distributed Acoustics/Vibration Sensing with Dual Path Network,” in 26th International Conference on Optical Fiber Sensors (Optical Society of America, 2018), paper WF105.
31. Z. Wang, L. Zhang, S. Wang, N. Xue, F. Peng, M. Fan, W. Sun, X. Qian, J. Rao, and Y. Rao, “Coherent Φ-OTDR based on I/Q demodulation and homodyne detection,” Opt. Express 24(2), 853–858 (2016). [CrossRef] [PubMed]
32. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet Classification with Deep Convolutional Neural Networks,” in 25th International Conference on Neural Information Processing Systems (Association for Computing Machinery, 2012), pp. 1097–1105.
33. N. S. Keskar, D. Mudigere, J. Nocedal, M. Smelyanskiy, and P. T. P. Tang, “On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima,” arXiv: 1609.04836 (2016).