The energy-dispersive X-ray diffraction technique can be more practical and accurate for security applications such as detecting drugs and explosives. Here, an accurate multivariate discriminant analysis (MDA) method is used to identify the energy-dispersive X-ray diffraction spectra of illicit contraband. MDA is a comprehensive algorithm based on the principal component analysis algorithm, spectral angle matching method, and correlation coefficient method. Experiments are performed to acquire the diffracted spectra of drugs and common daily necessities. The accurate identification of models for an unknown substance can indicate the substance type in an already established database. Even in the case of shielding, the concealed object can be correctly identified, and the identification accuracy improved much compared with other algorithms.
© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
Countries are facing security threats from drug smuggling and gunrunning worldwide. Therefore, a non-destructive and accurate detection of express deliveries and baggage to identify contraband, such as drugs and explosives, is urgently needed. Due to strong penetrability, the X-ray detection technology has become the main detection technology in the field of security inspection. The X-ray transmission technology and the X-ray backscattering technology are usually used to obtain information, such as the shape, density, and atomic number of the substance in detected objects [1,2]. However, distinguishing the substances with similar densities and atomic numbers is difficult for both techniques, viz. to distinguish the contraband from common daily necessities. X-ray diffraction techniques, including angular dispersive X-ray diffraction (ADXRD) and energy-dispersive X-ray diffraction (EDXRD), are capable of accurately identifying substances by molecular structures, and are, thus, used to detect explosives and drugs [3,4]. Compared with ADXRD, EDXRD uses a polychromatic source that is easily available and can produce photons with different frequencies. As the measurement of the energy of the diffracted X-rays is fixed at a scattering angle (according to Planck’s energy-frequency relation E = h/λ), the components of an EDXRD system negate the need for rotating around the samples , which provides a theoretical basis to achieve fast measurement. Therefore, EDXRD is an effective means of accurately detecting and identifying contraband.
The diffracted spectrum in EDXRD has characteristic peaks with different intensities and energies corresponding to different crystal structures. Based on an effective recognition algorithm, a reasonable and correct analysis of theses peaks is the key to identifying the detected substance. Several analytical methods for interpreting the diffraction spectra have been proposed (e.g., analysis algorithms for the energy of the main peaks, algorithms for calculating the correlation of different diffraction spectra, and principle component extraction for spectra) [6–8]. The recognition result of such a single algorithm is easily affected by factors, such as the system parameters of the experimental equipment and the detection time. Moreover, training the identification model requires a large amount of sample data, and the recognition error rate is high .
This study presents a multivariate discriminant analysis (MDA) method comprising the principal component analysis (PCA) algorithm with the Euclidean distance judgment, the spectral angle matching (SAM) method, and the correlation coefficient method. We perform energy-dispersive X-ray diffraction experiments with different daily necessary samples (e.g., NaCl and paracetamol) and built database. The tested spectrum is compared with the sample spectrum in the database based on the MDA and three single algorithms. The relative similarity of similar substances in the database is calculated. Compared to SAM and correlation coefficient, the difference of the relative similarity between the most similar substances and the other substances is multiplied with MDA. The diffraction spectrum can still be accurately identified by MDA when the test material is occluded. In addition, a diffraction experiment of five drugs is conducted to establish the database to verify whether or not the contraband can truly be identified. The accuracy of MDA is verified by the identification results of the drug test spectrum. Moreover, an effective threshold can be performed to enable amorphous substances, such as flour, which is not in the database, to be distinguished.
2. Theory of MDA
As mentioned earlier, the MDA is composed of the principal component analysis (PCA) algorithm with the Euclidean distance judgment, spectral angle matching (SAM) method, and correlation coefficient method to analyze and identify profiles. According to the characteristics of the three algorithms, MDA adjusts the weight of each algorithm by training the diffraction spectrum to improve the identification accuracy. And the main information of the characteristic peaks of the two comparative spectra can be extracted to form an accurate recognition model:
The diffraction energy spectrum data obtained from the experiment can be regarded as a high-dimension matrix. The calculation is large if we directly calculate the high-dimension matrix, and the excess signal will affect the feature identification. Therefore, the PCA method is used to extract the eigenvalues of the data, and the data dimension is reduced. The principal components can represent most of the information of the original variables, and the information contained is not repeated [10,11]. The reasonable use of the PCA can make it a good means of data analysis and dimension reduction in the identification process. Assuming that the number of samples in the database is n and the number of features of the diffraction spectrum of each sample is m. Then the diffraction spectrum can be regarded as a one-dimensional matrix with the number of data m. Therefore, the database can be used as a two-dimensional matrix Yn×m, where m = 1024 (the number of channels of the diffraction spectrum data output by detector is 1024, and each channel corresponds to a constant energy). The covariance matrix Cm×m of the matrix Yn×m can be calculated:
The SAM method measures the similarity between two spectra by calculating the generalized angle between each pixel vector of the sample spectrum and the reference spectrum and indicates the figure information of the spectrum shape . The SAM method can extract a significant difference using the generalized angles between the two spectra, even if the two spectra has a similar number of diffraction peaks and intensities. When a certain sample in the database is compared with the test sample, the corresponding row vector M [M1 M2 … M1024] in the two-dimensional matrix Yn×m is extracted. The diffraction spectrum data of the test sample is N [N1 N2 … N1024]. Then the generalized angle between the two spectra can be expressed as:
The correlation coefficient method is based on the dispersion of two variables, and the correlation between two spectra can be reflected by multiplying two dispersions. It also contains the peak energy and intensity information of the spectrum , the higher the correlation between the spectra, the smaller the absolute value calculated from the correlation coefficient:
3. Experimental details
An optical system was set up to obtain the spectrum. The photons emitted by a polychromatic source radiated on an object, and the diffracted light is received by a semiconductor detector, which has the energy resolution ability. The Bragg formula shows that when the diffraction angle is fixed, the characteristic energy spectrum of the diffraction intensity as a function of incident X-ray energy can be obtained for materials with different lattice structures. The lattice constant of the substance corresponds to the energy E. The Bragg formula for energy dispersive X-ray diffraction can be expressed as follows [14,15]:
Figure 1 shows the geometry structure of the experimental system. The X-rays radiating from a tungsten target X-ray tube (NDI-225-22, Varian) irradiated the sample operated at 80 kV and 25 mA to obtain a polychromatic beam. The measured spectrum of the tube was shown in Fig. 3. The X-ray source had a 5.5 mm focal spot, and the radiation coverage was 40°. The scattered photons were detected at a nominal 5° with a CdTe diode detector (XR-100T-CdTe, Amptek). The detector’s energy resolution was less than 1.5 keV FWHM at 122 keV (57Co). The X-rays passed through the primary collimation slits (P1, P2) and irradiated the sample to be tested. The diffracted light passed through the secondary collimation slit, and was received by the detector. The distance between P1 and P2 was 400 mm, and the width of both slit apertures was 0.5 mm. The distance from the secondary collimating slit to P2 was 500 mm, and detector can detect an effective range of approximately 55 mm on the optical axis. The distance from P2 to the center of the effective range was 280 mm.
4. Results and discussion
A database was created before the identification process. By performing the recognition algorithm, the spectrum of the tested sample was compared with the spectra in the database to determine the sample composition. The intensity of the diffraction peaks during the identification process was normalized to ensure that the intensity difference between the diffraction peaks of different spectra was relative and improve the identification accuracy. The normalization process is performed by dividing the spectral intensity by the extreme value, that is, the intensity of the highest peak, and mapping the data into the interval [0, 1]. The profiles obtained by the experiment contained noise information, and the normalization process led to useless information to be amplified. Therefore, the experimental spectrum must be filtered. The Savitzky–Golay (S–G) filter was used to maintain the shape and the width of the signal while reducing the noise signal. The acquisition time was shortened to achieve a fast detection. The S–G proved to be a good tool to mitigate the impact of the spectral resolution caused by the short acquisition time . Figure 2 shows the identification flow chart.
4.1 Experimental result
Figures 3(a)–3(c) show the intensity profiles of the sample with simple diffraction pattern (NaCl), the sample with complex diffraction pattern (paracetamol), and a drug methcathinone, respectively. The sample were placed in the center of the effective range. The acquisition time was set to 20 s, and NaCl exhibited one diffraction peak. Both the diffraction peak energies of paracetamol and methcathinone were concentrated between 20 and 50 keV, thereby matching well with the data in the JDPDS . In addition, from the experimental spectrum of the sample, the diffraction peak energies of drugs and most daily necessities were concentrated in the 10–60 keV range. Thus, the experimental spectrum was intercepted to the 10–60 keV range for identification. Consequently, the electromagnetic noise influence can be eliminated. Figures 3(a)–3(c) show the filtered experimental spectrum of NaCl, paracetamol, and methcathinone, respectively. When the sample is placed at different positions in the effective scattering volume, the characteristic peaks of the diffracted spectrum will shift, but the deviation is small. The NaCl with the thickness of 5 mm was placed at both ends and center of the effective scattering range, respectively, and the diffracted spectra are shown in Fig. 3(d). As the distance between the sample and P2 increases, the diffraction peak of the spectrum shifts toward the low energy direction. While, within the effective range, the maximum deviation is about 1 keV.
4.2 Identification result
The spectra of 80 samples, including daily necessaries (NaCl, soda, sucrose, some medicine, etc.) were contained in the database. We used NaCl and paracetamol as the test samples to compare the single algorithm and the MDA analysis. The test samples were 5 mm thick and were placed in the center of the effective range. The top five substances in the database that were most similar to the test spectrum were output when the spectrum was identified using the spectral angle matching method, correlation coefficient method, and principal component analysis method alone. We set the similarity of the most similar substances as 1. Figures 4(a)–4(c) and 5(a)–5(c) show the normalized relative similarity of the five substances, where the abscissa indicated five samples similar to the test sample, and the ordinate indicated a similarity extent. Figures 4(d) and 5(d) depict the results by performing the MDA. The difference value (D–V) represents the normalized similarity difference between the most similar substances in the database and the second similar substances. For the identification result of NaCl, the D–V was approximately 0.8 when the MDA was performed, and the normalized similarity of the second material decreased by approximately 0.7 compared with that of SAM and the correlation coefficient and decreased by approximately 0.3 compared with PCA. The enlarged D-V is beneficial to determine the threshold after data training to screen out the most similar substance in the database to the test sample. Due to the diffraction peaks of NaCl are fewer than paracetamol, there appeared a larger D-V for NaCl. While, for the identification result of paracetamol, the D-V can still reach around 0.7 with MDA and significantly increased compared with the single algorithm. In order to analysis the influence of the spectral shift generated by the sample at different positions within the effective scattering range for recognition results, we identified the spectra of NaCl at three positions in Fig. 3(d) with MDA. As shown in Fig. 6, small spectral shifts have little effect on MDA recognition results and the recognition results of the spectra at the three locations remain consistent.
The above mentioned results showed that the diffraction spectrum can be correctly identified when the test sample is not obstructed. In reality, contrabands are often hidden in the complex context of the existence of various organic substances during transportation. Therefore, under the experimental conditions of a 5 mm PMMA occlusion, the spectra of NaCl and paracetamol were recorded. The PMMA adheres to the front surface of the sample and the identification results were given [Figs. 7(a) and 7(b)]. The figure shows the normalized relative similarity of the most similar substance and the second similar substance in the database with four algorithms. At this point, the relative similarity of the second similar substance was greatly improved for a single algorithm with occlusion, which was not conducive to distinguishing from the most similar substances. The MDA still guaranteed the D–V of approximately 0.7, which was beneficial to the threshold determination. Compared with the identification result of the unobstructed diffraction spectrum of NaCl [Fig. 4(d)], the D–V with the MDA reduced from 0.85 to 0.7 in the experimental condition of the 5 mm PMMA occlusion [Fig. 7(a)]. Meanwhile, for paracetamol, the D–V with MDA was almost unvaried. Furthermore, we compared the identification results of the paracetamol sample under different thickness PMMA occlusions with four algorithms. As the thickness of PMMA increased, the D-V of three single algorithms decreased significantly. When the PMMA thickness was 10 mm, the D-V of the SAM and correlation coefficient algorithm were about 0.1, and the D-V of the PCA was about 0.3. While the D-V of the MDA was about 0.7. Especially, when the PMMA thickness was 15 mm, the D-V of SAM and correlation coefficient algorithm were almost 0 and the D-V of PCA was dropped to around 0.2. By contrast, as the thickness of PMMA increased to 15 mm, the D-V was about 0.4 with MDA algorithm and it would be relatively difficult for identification.
Table 1 shows the recognition results and lists whether the test sample (paracetamol, NaCl, sucrose, Ibuprofen, Silica) can be correctly identified with four algorithms in the case of occlusions of approximately 1 cm clothes and 1 cm books respectively. The book density was larger than the cloth density, hence, more identification errors appeared in the case of occlusion of the book with three algorithms. Among them, SAM exhibited one identification error, and the correlation coefficient and the PCA both showed two recognition errors. However, the MDA always maintained a 100% recognition rate.
To verify whether or not the MDA is effective for drug identification, the diffraction spectra of five drugs (i.e., morphine, ketamine, N-acetylanthranilic acid, anthranilic acid, and methcathinone) were obtained, and a database was established, where the diffraction angle was 5°; the acquisition time was 30 s; and the drug mass in the effective detection area (the volume is 0.2×0.2×3 cm3) was approximately 0.2 g. The test sample was methcathinone without occlusion and methcathinone placed in a suitcase covered with clothes. Figure 8 shows the suitcase for drugs with occlusion. Figure 9 is the identification result with the MDA. The large D–V proved that the MDA can effectively find the substances most similar to the test sample in the database. The different identification results between the methcathinone with occlusion and without occlusion are mainly caused by the attenuation of the low energy diffraction spectrum and the Compton scattering of the obstruction.
Some powdered materials of an amorphous structure (e.g., flour and milk powder, which had similar density and atomic number to drugs) were hard to distinguish. Different sample data were trained by the MDA, and the weights of the three algorithms in the MDA were adjusted to determine whether or not materials like flour not existing in the database can be distinguished. In this manner, the threshold can be found for different databases to determine whether or not the test sample is in the substance. Figure 10 shows the correlation p obtained by the MDA calculation of the test samples and the database was 5 drugs (morphine, ketamine, N-acetylanthranilic acid, anthranilic acid, and methcathinone). The correlation p of flour, milk powder, and bean powder was low and less than 10, while correlation p of methcathinone was more than 30. An appropriate range can be found as the threshold in the interval between 10 and 30. Therefore, after the test sample was screened by the MDA, one can judge whether or not the material is in the database, and the name can be given if in the database. In addition, the number of samples in the database has an effect on the recognition results with MDA. When there are only 5 samples in the database, the correlation p of the most similar sample in the database can maintain a high value. As the number of samples in the database increases, the correlation p will gradually decrease, however D-V can change little by adjusting the e, f, and g of the MDA to continue to identify the correct result. Therefore, for different database, the parameter ratios of the three algorithms in MDA should be adjusted by training more test samples to determine the best threshold.
This study used a combination of the SAM method, correlation coefficient method, PCA principal component analysis and Euclidean distance judgment to explore the accurate identification of unknown samples. By using the multivariate discriminant analysis, the D–V was significantly increased, which was better for identifying the substances in the database most similar to the test and reducing the identification error rate. Moreover, the method can accurately determine whether or not the sample to be tested was a database sample and which sample was in the database. This basically realized the recognition function of the EDXRD experimental device. The verification of different experimental data is required for the establishment of a similarity threshold for different database, which will be one of the plans for future work.
National Key R&D Program of China (2016YFC0800904-Z03); Fundamental Research Funds for the Central Universities (22120180135).
1. M. D. Herr, J. J. Mcinerney, and D. G. Lamser, “A flying spot X-ray system for Compton backscatter imaging,” IEEE Trans. Med. Imaging 13(3), 461–469 (1994). [CrossRef]
2. G. Harding, “X-ray scatter tomography for explosives detection,” Radiat. Phys. Chem. 71(3–4), 869–881 (2004). [CrossRef]
3. Y. F. Chen, X. Wang, Q. H. Song, J. Xu, and B. Z. Mu, “Development of a high-energy-resolution EDXRD system with a CdTe detector for security inspection,” AIP Adv. 8(10), 105113 (2018). [CrossRef]
4. K. Wells and D. A. Bradley, “A review of X-ray explosives detection techniques for checked baggage,” Appl. Radiat. Isot. 70(8), 1729–1746 (2012). [CrossRef]
5. G. Harding, H. Strecker, D. Kosciesza, and J. Gordon, “Detector considerations relevant to x-ray diffraction imaging for security screening applications,” Proc. SPIE 7306, 730619 (2009). [CrossRef]
6. E. Cook, R. Fong, J. Horrocks, D. Wilkinson, and R. Speller, “Energy dispersive X-ray diffraction as a means to identify illicit materials: a preliminary optimisation study,” Appl. Radiat. Isot. 65(8), 959–967 (2007). [CrossRef]
7. C. Crespy, P. Duvauchelle, V. Kaftandjian, F. Soulez, and P. Ponard, “Energy dispersive X-ray diffraction to identify explosive substances: Spectra analysis procedure optimization,” Nucl. Instrum. Methods Phys. Res., Sect. A 623(3), 1050–1060 (2010). [CrossRef]
8. Y. Jiang and P. Liu, “Feature extraction for identification of drug and explosive concealed by body packing based on positive matrix factorization,” Measurement 47, 193–199 (2014). [CrossRef]
9. E. J. Cook, S. Pani, L. George, S. Hardwick, J. A. Horrocks, and R. D. Speller, “Multivariate Data Analysis for Drug Identification,” IEEE Trans. Nucl. Sci. 56(3), 1459–1464 (2009). [CrossRef]
10. M. Partridge and M. Jabri, “Robust principal component analysis,” J. Assoc. Comput. Mach. 58(3), 11 (2011).
11. A. Dicken, K. Rogers, and P. Evans, “The separation of X-ray diffraction patterns for threat detection,” Appl. Radiat. Isot. 68(3), 439–443 (2010). [CrossRef]
12. P. E. Dennison, K. Q. Halligan, and D. A. Roberts, “A comparison of error metrics and constraints for multiple endmember spectral mixture analysis and spectral angle mapper,” Remote Sens. Environ. 93(3), 359–367 (2004). [CrossRef]
13. J. Benesty, J. Chen, and Y. Huang, “On the importance of the Pearson correlation coefficient in noise reduction,” IEEE Trans. Audio Speech Lang. Process. 16(4), 757–765 (2008). [CrossRef]
14. D. O. Flynn, H. Desai, C. B. Reid, C. Christodoulou, M. D. Wilson, M. C. Veale, P. Seller, D. Hills, B. Wong, and R. D. Speller, “Identification of simulants for explosives using pixellated X-ray diffraction,” Crime Sci. 2(1), 4 (2013). [CrossRef]
15. E. J. Cook, J. A. Griffiths, M. Koutalonis, C. Gent, S. Pani, J. A. Horrocks, L. George, S. Hardwick, and R. Speller, “Illicit drug detection using energy dispersive x-ray diffraction,” Proc. SPIE 7310, 73100I (2009). [CrossRef]
16. B. C. Turton, “A novel variant of the Savitzky-Golay filter for spectroscopic applications,” Meas. Sci. Technol. 3(9), 858–863 (1992). [CrossRef]
17. Powder Diffraction File, JCPDS, International Centre for Diffraction Data, Pennsylvania (1980).