Generative adversarial networks and faster-region convolutional neural networks based object detection in X-ray baggage security imagery

Jongchol Kim; Jiyong Kim; Jinmyong Ri

doi:10.1364/OSAC.412523

1. Introduction

X-ray imaging is one of the widely known methods in non-destructive evaluation as a notable application in security systems [1]. In particular, X-ray security baggage inspection systems are an essential part of maintaining public transportation security and are installed at almost all stations and airports [2]. The X-ray image is a shadow image corresponding to the perspective projection of an object. Therefore, detecting objects in an X-ray image is a complex task for human inspectors or computers. X-ray security inspection has always been a challenge because of (1) The complexity of the contents of the individual baggage, (2) the different imaging modes, and (3) the decrease in screeners’ alertness when constantly looking at the screen and repeatedly viewing the almost same type of objects.

Establishing an efficient and intelligent security inspection system is crucial to promoting the safe operation of civil aviation and ensuring the safety of passengers [3]. Previous work within this context is primarily based on the bag of visual words model (BoVW) [4–7]. Recently, Convolutional Neural Network (CNN) models have shown strong performance on image classification and object detection [8–12]. CNN, a state-of-the-art paradigm for pattern recognition problem solving, has been introduced into the field of X-ray baggage security screening. We compare CNN to a BoVW approach with conventional hand-crafted features trained with a Support Vector Machine (SVM) classifier [2,13]. Xu et al. proposed a CNN-based method with an attention mechanism to automatically recognizing and localizing prohibited items in X-ray security images [14]. Considering some characters of security imagery, An et al. proposed a segmentation net with a novel dual attention, which could capture richer features for refining the segmentation results [15].

Although CNN models have attracted more and more attention in image content analysis and may have good performance in detecting prohibited items, the dataset of X-ray prohibited item images used for training cannot meet the network training requirements. In addition, it is actually difficult to collect enough X-ray images containing various prohibited items.

In this paper, we propose a method of enlarging the X-ray security inspection image database based on Generative Adversarial Networks [16]. GANs is one of the most popular neural network models used recently for image generation [17–22]. Self-attention GAN(SAGAN) [23], BigGAN [24], Wasserstein GAN(WGAN) [25], and progressive growing GAN (PGGAN) [26] can be used to generate the images with high resolution and rich diversity. Some GAN models, such as Pix2Pix GAN [27] and Cycle GAN [28], can also be used for image-to-image translation. The GAN-based image generation method can be used for data augmentation when the number of samples is insufficient, and its feasibility has been proven in several literatures [29–32]. Therefore, we focus on generating new images with better quality and diversity.

In this work, in order to improve the quality of the generated images, an image was created using an improved GANs-based method, and compared with the image generated in other GAN models through Fréchet Inception Distance(FID) score [33]. In addition, we evaluated the performance of the proposed method in the detection of three dangerous objects (handguns, razors, shuriken) using Faster R-CNN. Experimental results on the model performance and the effect of the data augmentation method show that the new augmented X-ray security inspection image dataset can improve the detector performance.

The rest of the paper is presented as follows. In section 2, we describe the background knowledge related to the GANs and an improved loss function. In section 3, the proposed algorithm and object detection method using Faster R-CNN are presented. In section 4, presents the experimental results. Finally, discussions and the conclusion are in section 5.

2. Data augmentation method based on Generative Adversarial Networks

In this section, we introduce the images generation model based on the GANs for data augmentation. GANs in the paper is shown in Fig. 1. The architecture of Generator (G) and Discriminator (D) is built on the basis of a full convolution encorder and decoder. To improve the visual quality of the generated image, we use the improved loss function and select the appropriate parameters for training.

Fig. 1. The structure of Generative Adversarial Networks in this paper.

Download Full Size | PDF

GANs is the most actively used algorithm in the field of generative model among unsupervised training models that can model the probability distribution of the data and generate infinite data sharing the probability distribution of the real data. GANs consists of pairs of discriminator and generator, and is an algorithm that completes a generator using adversarial network. In other words, generator and discriminator repeat adversarial competitive learning and ultimately complete a generation model that prevents the discriminator from distinguishing the data synthesized by the generator. In GANs, the competition between the generator and the discriminator is trained in the direction of solving the min-max problem and can be defined by the following equation.

(1)$$\mathop {\min }\limits_G \mathop {\max }\limits_D {V_{GAN}}(D,G) = {E_{x\sim {P_{data}}(x)}}[\log D(x)] + {E_{z\sim {P_z}(z)}}[\log (1 - D(G(z)))].$$

Where, ${P_{data}}(x)$ stands for the real data x distribution and ${P_z}(z)$ stands for the model distribution. Discriminator is trained in the direction of maximizing the objective function ${V_{GAN}}(D,G)$, in contrast, Generator is trained to minimize ${V_{GAN}}(D,G)$. Through this learning process, generator G and discriminator D alternately optimize the necessary generation and discrimination networks until they reach the equilibrium point. When the discriminator is viewed as a classifier, the objective function of GANs generally adopt sigmoid cross entropy loss. However, when updating the generator, the loss function is on the correct side of the decision boundary, but it causes the problem that the gradient disappears for samples where the generated image is far from the real image. This means that the quality of the generated image is not high. To solve this problem, Mao et al. [34] improved the image quality using the Least Squares Generative Adversarial Network. The objective function is defined as follows:

(2)$$\mathop {\min }\limits_G {V_{GAN}}(G) = \frac{1}{2}{E_{z\sim {P_\textrm{z}}(Z)}}[{(D(G(z)) - 1)^2}]$$

(3)$$\mathop {\min }\limits_D {V_{GAN}}(D) = \frac{1}{2}{E_{x - {P_{\textrm{data}}}(x)}}[{(D(x) - 1)^2}] + \frac{1}{2}{E_{z - Pz(Z)}}[D{(G(z))^2}].$$

When training using convolutional neural network, one of the most important problems is how to set the loss function. This is because the performance of the model changes depending on how the loss function is set. Since the training performance of the previous layer has a great influence on the latter layer, the generator is required to be very stable. During the generator training process, the weight is easy to update too fast, and the weight changes greatly each time, the weight can increase explosively. Therefore, to avoid this problem, a gradient penalty [18] is introduced, and its definition is as follows:

(4)$${L_{gp}} = {E_{\hat{x}\sim {P_{\hat{x}}}}}[{(||{\nabla _{\hat{x}}}D(\hat{x})|{|_2} - 1)^2}],$$

where $\hat{x}$ is uniformly sampled along straight line between the pair of data points sampled from the real data distribution and the generator distribution, respectively. A similar regularization is described by Kodali et al. [35]. In order to sharpen the image prediction, we define GDL(Gradient Difference Loss) [36] that can be combined with the adversarial loss function by directly punishing the difference of image gradient prediction in the generation loss function.

(5)$${L_{gpl}} = \sum\limits_{i,j} {{{|{\textrm{|}{x_{ij}} - {x_{i - 1j}}|- \textrm{|}G{{(z)}_{ij}} - G{{(z)}_{i - 1j}}|} |}^\sigma } + } {|{\textrm{|}{x_{ij - 1}} - {x_{ij}}|- \textrm{|}G{{(z)}_{ij - 1}} - G{{(z)}_{ij}}|} |^\sigma },$$

where σ is an integer greater than or equal to 1. The overall objective function is a function that sums all the least squares loss, gradient penalty loss and gradient difference loss mentioned above. The proposed loss function is as follows:

(6)$${L_{gan}} = \mathop {\min }\limits_G \mathop {\max }\limits_D {L_{ls}} + \alpha {L_{gp}} + \beta {L_{gdl}}.$$

Here, α and β are coefficients that control the magnitude of the influence of each loss function. The criteria were set around the adversarial loss function. $\alpha$ is the balance factor for the gradient penalty loss, which is deﬁned as 10

3. Object detection based on faster region convolutional neural networks

In this article, the method of detecting a specific object in X-ray images consists of two steps. The proposed object detection scheme is shown in Fig. 2. First step is image generation and second step is object detection.

Fig. 2. Flow chart of the object detection scheme

Download Full Size | PDF

In first step, we can generate an image data set of high-quality and diversity with a small number of sample images through GANs that use least squares loss, gradient penalty loss, and gradient difference loss. In the second step, training dataset generated in the first step is applied to the algorithm learning of Faster R-CNN for object detection. By artificially increasing the limited amount of training data, the accuracy of learning is improved, and then we can get accurate detection results for inspection images.

Faster R-CNN is a classical deep learning algorithm, which has high recognition accuracy, efficiency and good recognition rate for the large target area [37,38]. Figure 3 shows the architecture of Faster R-CNN. Faster R-CNN [38] is an object detection method based on R-CNN [37] and Fast R-CNN [10]. The main difference between Faster R-CNN and the two methods is the method of candidate region generation.

Fig. 3. Faster R-CNN features architecture.

Download Full Size | PDF

SPPNet (Spatial Pyramid Pooling in Deep Convolutional Networks) [39] and Fast R-CNN [10] have selected a region proposal using the Selective Search Algorithm [40]. However, although the performance of the Selective Search Algorithm is high in the overall evaluation, it is difficult to say that it is a good method in terms of computation time [41]. Faster R-CNN proposed a new method of constructing a neural network for selection of the region proposals into neural networks as shown in Fig. 4.

Fig. 4. The architecture of region proposal network.

Download Full Size | PDF

Faster R-CNN algorithm consists of two modules: the Fast R-CNN object detection module and the RPN (Region Proposal Network) extraction module. RPN is composed of neural networks such as convolutional layer and fully-connected layer, so learning is possible. In addition, fast calculation is possible by using GPU operation. RPN receives a 256-dimensional or 512-dimensional feature vector from the feature extractor and creates an intermediate layer through the sliding window. And then, it convolutions into a classifier layer and a regressor layer. The classifier layer applies 1×1 filter to get the output. In this layer, k anchor box with various scales and ratios is generated through the sliding window, and two scores indicating the existence of objects are assigned. 1×1 filter is also applied to the regressor layer, k anchor boxes are generated, and 4 coordinate values for displaying the coordinates of the bounding box are assigned. The output from the two layers is input to the fully-connected layer through the RoI Pooling layer to train. Faster R-CNN solves the computational problems and structural problems of R-CNN and Fast R-CNN by using RPN and RoI pooling layer, and greatly improves accuracy and computation speed.

4. Experimental and result

In this section, the experimental setup is first introduced, and the details of the experiments are provided. Finally, the experimental results are analyzed and discussed.

4.1 Experimental setup

Baggage X-ray images taken from the GDXray database were used in the study [42]. The dataset captures single-target and multi-target X-ray security images of guns, knives, shuriken, and other dangerous goods from multiple angles. Some training and testing images are given in Fig. 5.

Fig. 5. Training and testing images in baggage X-ray images.

Download Full Size | PDF

If it is used to train the CNN model, there are the following disadvantages: (1) The samples set is very small. There are few single-target images used to train the classification network, and it is easy to cause overfitting; (2) The background is monotonous. Therefore, we performed data augmentation on the GDX-ray dataset. In the second stage, training of Fast RCNN is done using the augmented data obtained in the first step. The training of Faster R-CNN was implemented on CPU: Intel Core i5-8400H 2.5GHz, and GPU: GTX 1660 with memory 12G. Software tools used include MATLAB R2019a and python.

4.2 Experimental result

4.2.1 Generated image quality evaluation

In this paper, we select the FID(Fréchet inception distance) evaluation method [33] to compare the generated image quality. FID score is a prevailing method of evaluating the generated GAN images, and performs well in terms of discriminability, robustness and efficiency [43,44]. Here, the FID score is used to compare the performance of the images generated by DCGAN, SAGAN, WGAN and the proposed method, the formula is shown in Eq. (7):

(7)$$FID(x,{x^\ast }) = ||{\mu _x} - {\mu _{{x^\ast }}}||_2^2 + Tr({C_x} + {C_{{x^\ast }}} - 2{({C_x}{C_{{x^\ast }}})^{1/2}}),$$

where ${\mu _x},{\mu _{{x^\ast }}},{C_x},{C_{{x^\ast }}}$ respectively represent means and covariance of the original data and generated data. The FID score is the amount that shows the difference between the real image and the generated image, the better images should have a lower FID score.

An experiment was conducted to generate a dataset using the GAN with the improved loss function mentioned in section 2. Training was conducted including the GDX-ray dataset, and a total of 1038 images were used as training data. Both the generative model and the adversarial model use the back-propagation algorithm to adjust the parameters. The weight and bias parameters were randomly generated, and the hidden layer was activated by Leaky ReLU. Table 1 show the structure of the generator and discriminator.

Table 1. The structure of the generator and the discriminator.

View Table | View all tables in this article

In this paper, the learning rate of the neural network is configured as 0.0002, the number of iterations is 20000 times, and the batch-size is 64. In the experiment, the convolution kernel is set to be 5*5(Table 2).

Table 2. The related parameters.

View Table | View all tables in this article

As shown in Fig. 6 and Fig. 7, some images of different X-ray prohibited items are generated by proposed model. From 3000 iterations, it was confirmed that images with the form of dangerous objects began to be generated. The quality of the generated images is very close to the real images shown in Fig. 5. These 3 classes of generated images have photorealistic quality.

Fig. 6. Generative image results of 3000 iterations.

Download Full Size | PDF

Fig. 7. Generative image results of 18000 iterations.

Download Full Size | PDF

Apart from the visual quality, we also compare the FID scores of four GANs. Table 3 presents the FID scores of the four GAN models. From the table, we can see that the proposed method has lower FID scores than other models. This indicates that the proposed model can approximate the real image distribution after improving the loss function and parameter settings.

Table 3. FID score comparision for differnet GAN models.

View Table | View all tables in this article

4.2.2 Object detection

In order to verify the generated dataset, the object detection system discussed in section 3, faster R-CNN was used to detect objects in the x-ray image. The data generated through the GAN model was synthesized with real data, the generated data was added to the training dataset. For these datasets with generated images, all the generated images were placed in the training set, and all the images in the testing set were from the initial dataset. The flowchart of the data augmentation method is shown in Fig. 2. In [38], as a criterion for evaluating Faster R-CNN, the AP(Average Precision) of each single category of the Pascal VOC 2007 Challenge and mAP(mean Average Precision) considering all the categories were used. In the work, we use the same criteria to evaluate the performance. The mean Average Precision is mathematically defined as follows [45]:

(8)$$mPA = \frac{1}{N}\sum\limits_{k = 1}^n {p(k)\Delta r(k)}.$$

In Eq. (8), N is the number of categories, n is the number of reference thresholds, and k is the threshold. p(k) is the precision rate, and r(k) is the recall rate. mAP is the average of APs for all categories. The higher both AP and mAP, the better the classifier performance is. Figure 8 shows the detection results for some images. As shown in Fig. 8, different threat objects in the image have been detected. In the same image, two shurikens were correctly detected. In addition, shuriken and handgun were detected as two different objects in the image.

Fig. 8. Object Detection results of some objects.

Download Full Size | PDF

The proposed method was compared with the real dataset and the dataset generated by different methods.

As shown in Table 4, Application results show that the proposed method is better than other methods. We believe that using more generated images for model training can further improve classification accuracy.

Table 4. Results from different detection and image dataset.

View Table | View all tables in this article

5. Discussion and conclusion

The primary motivation for this work is to realize the potential of image processing and visualization algorithms to improve the efficiency of detecting hidden and threat items in baggage inspection as compared to unknown images. In this paper, a method of generating a new X-ray image data set using the improved GANs was proposed and the generated data set was verified. As mentioned earlier, various emphasis on establishing an intelligent transportation security system and ensuring the safety of transportation and passengers has been extensively investigated [2–18]. Recently, image classification and object detection using deep learning convolutional neural networks in X-ray security inspection have attracted a lot of attention, but it is difficult to collect sufficient X-ray images containing various prohibited items used for training. Therefore, we used the improved GAN model to generate new images with better quality and diversity. The FID score and Faster R-CNN were used to verify the generated X-ray image dataset. As a result of verifying the generated training dataset through object detection, an average of 99.83% was obtained. If more image datasets are used for training, more realistic results can be obtained. This is because deep learning algorithms require diverse datasets for more accurate result. In this work, we solved this problem by generating a new X-ray image dataset. It seems likely that more advanced results can be obtained by adding the technology to actualize the data before training. Using the proposed method, it is possible to generate various data sets in an experiment, so it is possible to develop this method in the future.

Acknowledgments

None declared.

Disclosures

The authors declare no conflicts of interest.

We declare that we have no financial and personal relationships with other people or organizations that can inappropriately influence our work, there is no professional or other personal interest of any nature or kind in any product, service and/or company that could be construed as influencing the position presented in, or the review of, the manuscript entitled, “Generative Adversarial Networks and Faster-Region Convolutional Neural Networks Based Object Detection in X-ray Baggage Security Imagery”.

References

1. G. Zentai, “X-ray imaging for homeland security,” IEEE International Workshop on Imaging Systems and Techniques (IST2008), Chania, Greece, Sep.2008, pp.1–6. https://dx.doi.org/10.1109/IST.2008.4659929

2. S. Akcay, M. E. Kundegorski, M. Devereux, and T. P. Breckon, “Transfer learning using convolutional neural networks for object classification within X-ray baggage security imagery,” IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, pp. 1057–1061(2016). https://dx.doi.org/10.1109/ICIP.2016.7532519

3. D. Turcsany, A. Mouton, and T. P. Breckon, “Improving feature-based object recognition for X-ray baggage security screening using primed visual words,” IEEE International Conference on Industrial Technology (ICIT), Cape Town, 2013, pp. 1140–1145. https://dx.doi.org/10.1109/ICIT.2013.6505833

4. M. Baştan, M. R. Yousefi, and T. M. Breuel, “Visual words on baggage X-ray images,” Computer Analysis of Images and Patterns - 14th International Conference(CAIP 2011), Seville, Spain, Aug. 2011, pp. 360–368. https://dx.doi.org/10.1007/978-3-642-23672-3_44

5. M. Bastan, W. Byeon, and T. M. Breuel, “Object recognition in multi-view dual energy X-ray images,” in Proc. BMVC, 2013, pp. 130–131. https://dx.doi.org/10.1007/978-3-642-32717-9_15

6. M. E. Kundegorski, S. Akçay, M. Devereux, A. Mouton, and T. P. Breckon, “On using feature descriptors as visual words for object detection within X-ray baggage security screening,” International Conference on Imaging for Crime Detection & Prevention, Nov. 2016, p. 12–18. https://dx.doi.org/10.1049/ic.2016.0080

7. M. Baştan, “Multi-view object detection in dual-energy X-ray images,” Machine Vision and Applications 26(7-8), 1045–1060 (2015). [CrossRef]

8. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with convolutions,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Boston, MA, USA, Jun. 2015, pp. 1–9. https://dx.doi.org/10.1109/CVPR.2015.7298594

9. C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 2818–2826. https://dx.doi.org/10.1109/CVPR.2016.308

10. R. Girshick, “Fast R-CNN,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Santiago, Chile, Dec. 2015, pp. 1440–1448. https://dx.doi.org/10.1109/ICCV.2015.169

11. J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Las Vegas, NV, USA, Jun. 2016, pp. 779–788. https://dx.doi.org/10.1109/CVPR.2016.91

12. W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, “SSD: Single shot multibox detector,” in Proc. Eur. Conf. Comput. Vis. (ECCV), Amsterdam, The Netherlands, Oct. 2016, pp. 21–27. https://dx.doi.org/10.1007/978-3-319-46448-0_2

13. S. Akçay, M. E. Kundegorski, M. Devereux, and T. P. Breckon, “Using Deep Convolutional Neural Network Architectures for Object Classification and Detection Within X-Ray Baggage Security Imagery,” IEEE Trans. Inform. Forensic Secur. 13(9), 2203–2215 (2018). [CrossRef]

14. M. Xu, H. Zhang, and J. Yang, “Prohibited item detection in airport X- ray security images via attention mechanism based CNN,” Chinese Conference on Pattern Recognition & Computer Vision, Guangzhou, China, 2018, pp. 429–439. https://dx.doi.org/10.1007/978-3-030-03335-4_37

15. J. An, H. Zhang, Y. Zhu, and J. Yang, “Semantic segmentation for prohibited items in baggage inspection,” in Proc. Int. Conf. Intell. Sci. Big Data Eng. (ISCIDE), Nanjing, China, Oct. 2019, pp. 495–505. https://dx.doi.org/10.1007/978-3-030-36189-1_41

16. I. Goodfellow, J. P. Abadie, M. Mirza, B. Xu, and D. W. Farley, “Generative Adversarial Nets,” Advances in neural information processing systems2014, pp. 2672. https://arxiv.org/abs/1406.2661

17. A. Radford, L. Metz, and S. Chintala, “Unsupervised representation learning with deep convolutional generative adversarial networks,” Computer Ence, 2015. [Online]. Available: http://arxiv.org/abs/1511.06434

18. I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, “Improved training of Wasserstein GANs,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NIPS), Long Beach, CA, USA, Dec. 2017, pp. 5767–5777. https://arxiv.org/abs/1704.00028

19. Z. Ma, Y. Lai, W. B. Kleijn, Y. Song, L. Wang, and J. Guo, “Variational bayesian learning for dirichlet process mixture of inverted dirichlet distri- butions in nongaussian image feature modeling,” IEEE Transactions on Neural Networks and Learning Systems 30(2), 449–463 (2019). [CrossRef]

20. S. Gurumurthy, R. K. Sarvadevabhatla, and R. V. Babu, “DeLiGAN: generative adversarial networks for diverse and limited data,” IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, 2017, pp. 4941–4949. https://dx.doi.org/10.1109/CVPR.2017.525

21. M. Mirza and S. Osindero, “Conditional generative adversarial nets,” 2014, pp. 1–7, [Online]. https://arxiv.org/abs/1411.1784

22. T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen, “Improved techniques for training GANs,” International Conference on Neural Information Processing Systems (NIPS), Barcelona, Spain, 2016, pp. 2234–2242. http://papers.nips.cc/paper/6125

23. H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention generative adversarial networks,” 36th International Conference on Machine Learning, Long Beach, California, PMLR 97, 2019, pp.1–10. http://arxiv.org/abs/1805.08318

24. A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN training for high fidelity natural image synthesis,” ICLR2019, pp. 1-35. http://arxiv.org/abs/1809.11096

25. M. Arjovsky, S. Chintala, and L. Bottou, “Wasserstein GAN,” 2017, pp. 1–32, [Online]. Available: http://arxiv.org/abs/1701.07875

26. T. Karras, T. Aila, S. Laine, and J. Lehtinen, “Progressive growing of GANs for improved quality, stability, and variation,”, ICLR 2018, pp. 1–26, [Online]. Available: http://arxiv.org/abs/1701.10196

27. P. Isola, J. Zhu, T. Zhou, and A. Efros, “Image-to-image translation with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 1125–1134. https://dx.doi.org/10.1109/CVPR.2017.632

28. J. Zhu, T. Park, P. Isola, and A. Efros, “Unpaired image-to-image translation using cycle-consistent adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), Venice, Italy, Oct. 2017, pp. 2223–2232. https://dx.doi.org/10.1109/ICCV.2017.244

29. H. Shin, N. A. Tenenholtz, J. K. Rogers, C. G. Schwarz, M. L. Senjem, J. L. Gunter, K. Andriole, and M. Michalski, “Medical image synthesis for data augmentation and anonymization using generative adversarial networks,” International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI), Granada, Spain, 2018, pp. 1–11. https://dx.doi.org/10.1007/978-3-030-00536-8_1

30. M. Frid-Adar, E. Klang, M. Amitai, J. Goldberger, and H. Greenspan, “Synthetic data augmentation using GAN for improved liver lesion classification,” IEEE International Symposium on Biomedical Imaging (ISBI), Washington, DC, 2018, pp. 289–293. http://arxiv.org/abs/1801.02385

31. T. Tran, T. Pham, G. Carneiro, L. Palmer, and I. Reid, “A Bayesian data augmentation approach for learning deep models,” in Proc. Int. Conf. Neural Inf. Process. Syst. (NIPS), Long Beach, CA, USA, Dec. 2017, pp. 2797–2806. https://arxiv.org/abs/1710.10564

32. F. Calimeri, A. Marzullo, C. Stamile, and G. Terracina, “Biomedical data augmentation using generative adversarial neural networks,” in Proc. Int. Conf. Artif. Neural Netw. (ICANN), Alghero, Italy, Sep. 2017, pp. 626–634. https://dx.doi.org/10.1007/978-3-319-68612-7_71

33. Q. Xu, G. Huang, Y. Yuan, C. Guo, Y. Sun, F. Wu, and K. Q. Weinberger, “An empirical study on evaluation metrics of generative adversarial networks,” 2018, [Online]. Available: http://arxiv.org/abs/1806.07755

34. X. Mao, Q. Li, H. Xie, R. Lau, Z. Wang, and S. Smolley, “Least squares generative adversarial networks,” Computer Vision and Pattern Recognition 1, 2813–2821 (2017).

35. K. Naveen, A. Jacob, H. James, and K. Zsolt, “On Convergence and Stability of GANs,” 2018, pp. 1–18, [Online]. Available: http://arxiv.org/abs/1705.07215. http://arxiv.org/abs/1705.07215

36. M. Mathieu M, C. Couprie, and Y. LeCun, “Deep multi-scale video prediction beyond meansquare error,” ICLR 2016, pp. 1–14, [Online]. Available: http://arxiv.org/abs/1511.05440

37. R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Columbus, OH, USA, Jun. 2014, pp. 580–587. https://dx.doi.org/10.1109/CVPR.2014.81

38. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: Towards real-time object detection with region proposal networks,” in Proc. Adv. Neural Inf. Proces. Syst., Montreal, QC, Canada, 2015, pp. 91–99. https://dx.doi.org/10.1109/TPAMI.2016.2577031

39. S. Ren, K. He, R. Girshick, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visual recognition,” IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1904–1916 (2015). [CrossRef]

40. J. Uijlings, K. van de Sande, T. Gevers, and A. Smeulders, “Selective Search for Object Recognition,” in IEEE International Conference on Computer Vision (ICCV), 2012, pp.154–172. https://dx.doi.org/10.1007/s11263-013-0620-5

41. J. Hosang, R. Benenson, and B. Schiele. “How good are detection proposals, really?” 2014. [Online] https://dx.doi.org/10.5244/C.28.24

42. D. Mery, V. Riffo, U. Zscherpel, G. Mondragón, I. Lillo, I. Zuccar, H. Lobel, and M. Carrasco, “GDXray: The database of X-ray images for nondestructive testing,” J. Nondestruct. Eval. 34(4), 42 (2015). [CrossRef]

43. M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S. Hochreiter, “GANs trained by a two time-scale update rule converge to a local nash equilibrium,” in Proc. Adv. Neural Inf. Process. Syst., 2017, pp. 6626–6637. http://arxiv.org/abs/1706.08500v1

44. Z. Ma, A. E. Teschendorff, A. Leijon, Y. Qiao, H. Zhang, and J. Guo, “Variational bayesian matrix factorization for bounded support data,” IEEE Trans. Pattern Anal. Mach. Intell. 37(4), 876–889 (2015). [CrossRef]

45. M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman, “The Pascal Visual Object Classes Challenge,” Machine Learning Challenges, Evaluating Predictive Uncertainty, Visual Object Classification and Recognizing Textual Entailment, MLCW 2005, Southampton, UK, April pp. 117–176 (2005). https://dx.doi.org/10.1007/11736790_8

The layer number	Generator structure		Discriminator structure
The layer number	type	Image size	type	Image size
1	Input	1*128 random noise	Input	1281283
2	Convolution	441024	Convolution	646464
3	Convolution	88512	Convolution	3232128
4	Convolution	1616256	Convolution	1616256
5	Convolution	3232128	Convolution	88512
6	Convolution	646464	Convolution	441024
7	output	1281283	output	0∼1 probability value

Parameters	Value
Activation Function	ReLU
Learning rate	0.002
Batch size	64 & Batch Normalization
Iterations	20000

Model	Handgun	Shuriken	Razor
DCGAN	213.32	187.19	239.28
SAGAN	179.16	160.44	193.21
WGAN	157.83	149.83	168.37
Proposed	106.67	101.34	115.45

Faster R-CNN results	mAP(%)	Handgun(AP)	Shuriken(AP)	Razor(AP)
Initial dataset	85.77	81.35	88.65	87.32
DCGAN dataset	90.16	88.29	91.83	90.38
WGAN dataset	88.54	86.75	90.45	88.43
Proposed	91.30	90.97	92.23	91.41

The layer number	Generator structure		Discriminator structure
The layer number	type	Image size	type	Image size
1	Input	1*128 random noise	Input	1281283
2	Convolution	441024	Convolution	646464
3	Convolution	88512	Convolution	3232128
4	Convolution	1616256	Convolution	1616256
5	Convolution	3232128	Convolution	88512
6	Convolution	646464	Convolution	441024
7	output	1281283	output	0∼1 probability value

Generative adversarial networks and faster-region convolutional neural networks based object detection in X-ray baggage security imagery

Abstract

1. Introduction

2. Data augmentation method based on Generative Adversarial Networks

3. Object detection based on faster region convolutional neural networks

4. Experimental and result

4.1 Experimental setup

4.2 Experimental result

4.2.1 Generated image quality evaluation

4.2.2 Object detection

5. Discussion and conclusion

Acknowledgments

Disclosures

References

Cited By

Figures (8)

Tables (4)

Equations (8)

OSA Continuum