A visual defect detection for optics lens based on the YOLOv5 -C3CA-SPPF network model

Haiyang Tang; Haiyang Tang; Haiyang Tang; Shan Liang; Dan Yao; Yongjie Qiao

doi:10.1364/OE.480816

1. Introduction

Optical glass lens has a high refractive index and low dispersion. It is widely used to manufacture various precision lenses [1]. However, in the producing process of optical glass lens, it is inevitable that optical lens defects are caused by raw material defects, melting defects, refractory material defects, shape defects, annealing defects, and other reasons, which is expressing itself in all kinds of forms, such as impurities, cracks, the appearance of imprints, and bubbles appearing on the surface and interior of the optical lens glass. The defects in the lens directly affect the scattering characteristics of the lens and the performance of the optical element. In normal visible light, it is easy to ignore the transparent bubbles with the low contrast in the background. Traditionally, optical defects were detected by manual detection of defects under the environment of a special light source, but this method has low detection efficiency and high labor intensity. Therefore, defects detection is an essential issue in the lens production process, which can provide an important guarantee for the quality consistency of finished products and decrease financial loss.

As computer vision and artificial intelligence technologies develop dramatically fast, it has been a trend that machine vision and deep learning based on detection gradually replace manual detection. Two-stage target detection algorithms containing R-CNN [2], Fast R-CNN [3], Faster R-CNN [4] and Mask R-CNN [5] use the region proposal network (RPN) [6] to extract the location and type information of targets, then use the last layer of feature map to predict object regions. On the other hand, the single stage target detection algorithm is directly used to obtain the location and type information of the targets, which is an end-to-end target detection algorithm. For example, SSD [7], YOLO [8], YOLOv3 [9,10] realize object detection using feature maps of multi-scale. For the vision-based approach, defects with a clear edge profile and high contrast are easily detected. However, defects such as low contrast impurities and small bubbles in the optical lens pose a challenge to the automatic defect detection. So as to promote the detection performance of multi-scale defects detection, low contrast defects detection, nesting, stacking, and intersection of defects detection in the optical lens, this study proposed the YOLOv5-C3CA-SPPF based approach, which has a hybrid module that combines coordinate attention and CSPNet (C3) to improve the extraction capability of defect features. Therefore, the proposed YOLOv5-C3CA-SPPF locates the defects more accurately. To experimentally evaluate the proposed method, we created a data set of optical lens with more than total defect samples of 19000, which consists of optical lens defects with multi-scale and different morphological characteristics defects. This dataset contains optical lens images with different types of defects like impurities, cracks, bubbles, the appearance of impress with the independent distribution or the intersection stacking distribution, etc. The outcome of the experiment indicated that our proposed YOLOv5-C3CA-SPPF is favorable for the detection of low-contrast, multi-scale and mixed complex defects, outperforming other popular methods.

The major contributors of our work can be summed up following: (a) A challenging dataset containing five different types of defects in the optical lens with different morphological characteristics was created. (b) A simple and effective hybrid module YOLOv5-C3CA-SPPF with fewer parameters was proposed, which embeds the location information of defects in the feature maps and significantly improves the defect detection ability. (c) We have also studied the effectiveness of each contribution of the aforementioned hybrid module and SPPF module on accuracy. It is found that the hybrid module is slightly higher than the SPPF module. Overall, the improved defects detection method could defect multi-scale defects, some transparent low-contrast targets and mixed complex defects.

The remainder of this paper is formed as follows: Section 2 discusses present inspection algorithms and attention mechanisms for detection. In Section 3, the dataset and the proposed method for defects detection are illustrated. Section 4 depicts the experimental platform and parameters. Section 5 presented results, and the paper concludes in Section 6.

2. Related works

2.1 Vision-based defect detection

In the wake of the development of machine vision, it is widely used to industry, especially in the defects detection of the product. Jothi et al. proposed a generalized Hough transform algorithm for implementation in an automated intraocular lens defect detection and quality assessment system. The experimental outcomes reveal that the algorithm would reduce labor cost and lead to better segregation of the lenses [11]. Jinda et al. proposed an algorithm of defects detection, which adopt some optical techniques such as light transmission, transmitted fringe deflectometry and dark field illumination. It showed that the method would detect the microdefects of the small-sized optical lenses with high efficiency and precision [12]. Even though all the aforementioned methods are non-destructive, the major drawback is that, if the above-mentioned defects are scattered on the surface of the parts with large contrast, clear defect characteristics, which are easy to discriminate and classify by the filtering and the generalized Hough transform. However, there are many kinds of low-contrast similar to background defects, and the generalized Hough transform is not able to discriminate and classify because of profile features missing after filtering.

In addition to the above issues, there are different defect forms and multi-scale optical lens defects, some of which are widely distributed on the surface and inside the lens. It is easy to nest, stack, and intersect defect samples in the image. The complexity of the above defect features puts forward higher requirements for optical inspection.

Along with the dramatically fast development of machine learning, it has been a trend that deep learning is widely available for vision-based automatic detection. Several works have been proposed on vision-based methods for surface defects of the optical lens, rail, etc. Yang at el. proposed a blemish detection model of the camera lens, which adopt a convolutional neutral network (CNN), and the blemish regions is localized by the a class activation map (CAM) technique it could achieve an accuracy of 99% and a recall of 98.7% for lens classification and localizing defect regions of blemish. [13]. Fan at el. proposed an actual time and effective algorithm based on YOLOv4 that combines the cross-stage partial block of YOLOv4 and a convolutional block attention module, the algorithm improves the average precision (AP) of linear defects by 2.11%, reduces model size by 13.3% and parameters by 14.14%. Furthermore, this algorithm improves frames per second (FPS) by 50% and obtain actual time performance for industrial production [14]. Chen at el proposed a method named YOLOv3-dense, which adopt the densely connected convolutional networks (DenseNet) as the backbone of the Darknet-53 for the chips detection of the surface-mounted device light-emitting diodes (SMD LED). the result show that the mean average precision (mAP) of YOLOv3-dense model higher than the CAM localization of CNN by 33.69% and the traditional YOLOv3 by 14.98% respecially [15]. Feng at el. proposed a rail defects detection algorithm, which adopt two different structures of MobileNet method to test the capability of detection. The result show that the proposed algorithm has an outstanding performance in the speed and accuracy [16].

Although these approaches were helpful in detecting surface defects more precisely, due to the insufficient diversity of defect samples and fewer complex samples, they may significantly reduce the generalization and robust properties of neural networks.

2.2 Attention mechanism

The attention mechanism enhances the learning ability, the generalization and the robust properties of the CNN network algorithms. The attention mechanism can be split into spatial domain, channel domain and mixed domain three types, such as SE [17], ECA [18] and CBAM [19]. SE network puts too much weight on assigning weights to different channel features, but capture the low levels of correlation among the channels. ECA network can effectively remedy these defects and has a good cross-channel information acquisition capability. CBAM network has added the spatial attention channels on the basis of channel attention, but it does not make full use of the features from other scales. SE and ECA only focus on internal channel information and fail to take into account location information. CBAM uses global pooling operations to map location information, which can only capture local information and cannot obtain large-scale dependent information.

Several algorithms on the CNN network that combine attention mechanisms were proposed to detect defects. The attention mechanism has a complex structure, which consumes a lot of computational resources owing to large amount of model parameters. In order to detect the insulator string under a complex scenes containing diverse appearance and obscuration, Han at el. proposed an improved U-Net-based insulator segmentation method, which embedded attention mechanism ECA-Net and achieved an average overlap IOU of 96.8% [20]. Xiang at el. proposed a model based on one-dimensional convolution neural networks, which combines the advantages of inception and CBAM, and uses channel and spatial attention mechanisms to boost the feature extraction capabilities of the inception network [21].

3. Materials and methods

3.1 Data acquisition

In the optical lens production, it is inevitable that there will be various defects, among which the defects appear on the surface and interior of optical glass lens. Optical glass lens defects are usually caused by raw material defects, melting defects, refractory material defects, shape defects, annealing defects, and other defects. The common defects of the optical glass lens are shown in Fig. 1. We collected different kinds of defects and created a large number of training samples by manual selection from the manufacturer of optical lenses. To simulate the randomness and characteristics of complex samples, a script program was used to randomly select the single defect sample from a batch containing the above five defect data samples to compose a picture with defects. Furthermore, data augmentation technique including random rotation, horizontal and vertical flipping were employed to enhance the dataset size. Apart from this, the generative adversarial network (GAN [22]) was also adopted to learn the potential distribution of the defect data model and then generated the distribution of the approximate real data samples. Here, we have not set any special parameters or prior conditions to get our data set, by this, the rich diversity of the generated data can be significantly enhance. A total of 3800 images were acquired. More than 600 pictures were prepared for each type of sample such as bubbles, cold explosions, internal cracks, impurities, pressbacks and mixing samples. In our work, the Labellmg software was employed to label the sample data sets for the format files of the model training. The information is made up of the type, number, labeling frame and the center point coordinates of the target.

Fig. 1. Defects of the optical lens (a) transparent bubbles, (b) dark bubbles, (c) cold explosion, (d) internal crack, impurities, (f) press back, (g) multi-scale mixing sample of bubbles impurities and press back, (h) nesting mixing sample of bubbles impurities and cold explosions, (i) stacking and intersection mixing sample of bubbles impurities and cold explosions.

Download Full Size | PDF

3.2 YOLOv5-C3CA-SPPF

This paper proposes a new defect detection method to enhance the performance of the optic lens. As shown in Fig. 2, the proposed YOLOv5-C3CA-SPPF based approach consists of three main parts: YOLOv5 is responsible for the basic algorithm framework, C3CA focuses on improving the ability for expression the features , and SPPF increases the receptive field of the convolutional network. The proposed method can overcome the shortcomings of CSPNet. The large-scale convolutional neural network only extracts local relations and lacks the ability to extract long-distance relationships of feature information. For this reason, this paper proposes a simple and effective C3CA hybrid module, additionally, the SPPF layer concatenates the different scale feature maps to increase the receptive field, effectively improving the multiscale defect detection and reasoning speed of the network.

3.2.1 Network structure of the original YOLOv5

Compared with Faster R-CNN, YOLOv3 and YOLOv4 [23]. YOLOv5 [24] is an advanced algorithm that has outstanding performance in the precision, recall. Furthermore, There are four main versions, which contain the smallest storage size of YOLOv5s [25], the largest storage size of the YOLOv5x [26], YOLOv5m, YOLOv5l. To achieve fast optical lens defect detection, YOLOv5s was selected in our experiment. The basic network structure of the YOLOv5 is mainly composed of the following three parts: backbone, PAENet and YOLO Head. As shown in Fig. 2 CSPDarknet, as the backbone of YOLOv5, was employed to extract the feature, which mainly consists of Focus, CSP, and SPP. Focus is a slicing operation on the feature map, CSPNet is composed of residual network structure, which solves the gradient information disappearance and the over-fitting phenomenon when training deep networks. SPP solves the fixed dimension problem so that the feature map of any size can be converted into a fixed-size feature vector between the convolution layer and the full connection layer. PAENet is a series of pyramid layers, which can integrate the simple targets of the shallow layer feature map and the complex targets of the deep layer feature map. The information fusion of a feature map is more conducive to target differentiation. The YOLO Head uses Non-Maximum Suppression (NMS) [27] as the criteria to predict box and class regression. However, as the defect samples become more complex and diverse, the algorithm needs to extract the more complex feature information and the higher location accuracy. The situation poses a challenge for the feature representation ability, the generalization and the robustness of the YOLOv5 algorithm.

Fig. 2. YOLOv5-SPPF-C3CA network structure.

Download Full Size | PDF

3.2.2 Improved C3CA model based CSPNet

The general CSPNet uses multiple convolution operations. However, the residual network structure of the CSPNet intertwined combination of convolutional layers $(1\times 1)$ and convolutional layers $(3\times 3)$ performs feature extraction, a great deal of long-distance relations of the feature information is lost, and the defect location information cannot be kept in the feature map, which lead to anomaly deviation results of the prediction regression and classification. To avoid this situation, this study combined CSPNet and coordinate attention (CA) module [28] by series connection to boost the accuracy of network detection and location accuracy, which not only diminishes the gradient disappearance problem when increasing the depth of the neural network and the model parameters, but also captures remote dependencies in different channels towards a spatial direction and retains accurate location information towards the other spatial direction in the feature map. The structure of the hybrid module is shown in Fig. 3. First of all, the feature map was sent to the CSPNet structure. After processing by the mixed layer (convolutional layers + BatchNorm layers + SiLU layers) of the CSPNet, the input feature map of the residual network would summarize the result processed by the combination of convolutional layers $(1\times 1)$ and convolutional layers $(3\times 3)$. After that, the result from the above mixed layer and the result from the residual network would concatenate with each other, then it is processed by the mixed layer in the next step. Finally, the result would be sent to the CA module. The calculation processes of the residual network, the mixed layer and the CSPNet are shown in (1)–(3), respectively.

(1)$$BL(F)=(f^{3 \times 3}(f^{1\times1}(F))+F)$$

(2)$$CBA(F)={\sigma}_{L}({B}_{N}({f^{c1 \times c2}}_{2d}(F)))$$

(3)$${x}_{C}=CBA(Cat(CBA(F),BL(CBA(F))xN))$$

where $F$ is the feature map, $f$ is the convolution operation, $\sigma _{L}$ is the function of the SiLU, $Cat$ is the function of the feature map splicing and $N$ is the number. From Fig. 3, it can be seen that the feature map processed by the above CSPNet would be sent to the CA mechanism in the hybrid module. Coordinated attention adopts two steps to encode the channel relationships and long-distance relationships. One step is coordinate information embedding, another step is coordinate attention generation. The implementation process is as follows. In the fist step, to encode the width and height feature maps. The input feature map was handled by the global average pooling operation towards the directions of width and height, respectively. The height and width feature maps calculation equations are shown in (4) and (5),

(4)$${{Z}^{h}_{C}}(h)=\frac{1}{W}\; \sum_{{0\leq i \leq w}}{x}_{c}(h,i)$$

(5)$${{Z}^{w}_{C}}(h)=\frac{1}{H}\; \sum_{{0\leq i \leq h}}{x}_{c}(j,w)$$

After that, in the second step, the feature maps containing the width and height global perspective were processed by the $1 \times 1$ convolution operation, which change the dimension of the feature map with $C/r$, and then the feature map $F_{1}$ was treated with the normalization and the Sigmoid activation function operation respectively.

(6)$$f=\delta({F}_{1}([{Z}^{h},{Z}^{w}]))$$

The original feature is processed by the $1 \times 1$ convolution operation ($F_{h}$, $F_{w}$) and the Sigmoid function operation ($\sigma$) in width direction and height direction, the relevant calculation equations are shown in (7) and (8).

(7)$${g}^{h}=\sigma({F}_{h}([{f}^{h}]))$$

(8)$${g}^{w}=\sigma({F}_{w}([{f}^{w}]))$$

Eventually, the final feature map was given by multiplying the input feature map of the CA module and the feature maps with the attention weights of height and width. the relevant calculation equations is shown in (9).

(9)$${y}_{C}(i,j)={x}_{C}(i,j) \times {{g}^{h}}(i) \times {{g}^{w}}(i)$$

Fig. 3. Diagram of the C3CA structure.

Download Full Size | PDF

3.2.3 Improved spatial pyramid pooling faster module

To acquire the feature high-level semantic information of multi-scale features and make further elevating the detection accuracy and speed of the algorithm, a spatial pyramid pooling faster (SPPF) module is inserted between the convolution layer and the full connection layer. The SPPF module [29] could integrate the information of multi-scale local features, it enables the network to have a global perspective, which is convenient for acquiring abundant multi-scale features expression, as shown in Fig. 4. The original SPP module combines the input feature map and the three feature maps processed by the three max-pooling operations of the $5\times 5$, $9\times 9$, $13\times 13$ pooling kernels in a parallel connection to extract the final feature map. However, this kind of operation takes a long time. To further improve network operation efficiency and detection speed, the following optimization strategy is that the SPPF module fuses the feature map processed by the mixed layer (Convolutional layers + BatchNorm layers +SiLU layers) and the three feature maps processed by max-pooling operations of the $5\times 5$ pooling kernel in series connection to extract the final feature map.

Fig. 4. SPP layer.

Download Full Size | PDF

3.2.4 Loss function

The YOLOv5-C3CA-SPPF loss function is shown in (10).

(10)$${Loss}_{object}={Loss}_{loc} + {Loss}_{conf} +{Loss}_{class}$$

which consists of location loss ${Loss}_{loc}$, confidence loss ${Loss}_{conf}$ and category loss ${Loss}_{class}$, as shown in (11) – (13), respectively.

(11)$${Loss}_{loc}=1-GIoU$$

(12)$$\begin{aligned} {Loss}_{conf}={-} \sum_{i=0} ^{K \times K} {I}^{obj}_{ij}[{\hat{C}}^{j}_{i}log{C}^{j}_{i} + (1-{\hat{C}}^{j}_{i})log(1-{\hat{C}}^{j}_{i})] \\ -\lambda_{noobj} \sum_{i=0} ^{K \times K} \sum_{j=0} ^{M}{I}^{nobj}_{ij}[{\hat{C}}^{j}_{i}log{C}^{j}_{i} + (1-{\hat{C}}^{j}_{i})log(1-{\hat{C}}^{j}_{i})] \end{aligned}$$

(13)$${Loss}_{class}={-} \sum_{i=0} ^{K \times K} {I}^{obj}_{ij} \sum_{c\in classes} [{\hat{P}}^{j}_{i}log{P}^{j}_{i} + (1-{\hat{P}}^{j}_{i})log(1-{\hat{P}}^{j}_{i})]$$

where $K$ represents the final feature map divided into $K \times K$ cells, $M$ represents the number of anchor boxes corresponding to each grid number, $I_{ij}^{obj}$ represents an anchor frame with a target, $I_{ij}^{noobj}$ indicates that there is no target anchor frame, and $\lambda _{noobj}$ indicates the loss weight confidence without the target anchor box. In this paper, CIoU is selected to replace GIoU as the target box for the regression loss function by (14).

(14)$$CIoU = IoU-\frac{{\rho}^{2}(b,{b}^{gt})}{{c}^{2}}\; -\alpha\nu$$

where $\alpha$ is a balance parameter and does not participate in the gradient calculation.

(15)$$\alpha=\frac{\nu}{(1-IoU)+\nu}$$

and $\nu$ is employed to measure the consistency of the length-width ratio

(16)$$\nu=\frac{4}{{\pi}^{2}}\;(arctan \frac{{w}^{gt}}{{h}^{gt}} -arctan \frac{w}{h})^{2}$$

CIoU considers the truth comprehensively overlap ratio, center point distance, length and width between the real frame and prediction frame ratio, solving the problem of overlapping consistency of regression.

4. Experimental setup

For the sake of verifying the detection capability of the YOLOv5-C3CA-SPPF algorithm, in this study, the SSD, Faster R-CNN, Retina Net YOLOv4 and YOLOv5 algorithms were trained using the same sample data set. The experimental platform of the paper is the CPU i7-10870h processor and NVIDIA Geforce RTX 3060, the operating system is Windows 10. The deep learning framework is PyTorch 0.4.1.

In our experiment, all models were trained using the random descent method, and the learning rate was adjusted according to the fixed interval. The weight decay and momentum were set to 0.0005 and 0.9, respectively. In addition, this paper evaluated the detection effectiveness of the YOLOv5-C3CA-SPPF in quantity by applying multi index comprehensive evaluating method, such as precision, recall, F1-score, mAP (mean average precision), and FPS (frame per second). The precision is used to measure the identification accuracy of each type of target, the recall show the ability to inspect targets of the model, F1-score reflects the harmonic mean of the precision and recall, frames per second (FPS) are used to comprehensively judge the average inference speed of the detector. For the five types of lens defect samples, a total of 3800 pictures were formed. The program randomly selected 3040 training set samples and 760 test set samples according to the 8: 2 ratio. 100 epochs were trained for all models in the sample data set.

5. Experimental results

5.1 Comparison of different object detection algorithms

In order to verify the effectiveness of the proposed method for detection of optical lens defects, this experiment uses SSD, Faster R-CNN, RetinaNet [30], YOLOv4 and YOLOv5 models as comparisons with our improved YOLOv5-SPPF-C3CA (Ours). Table 1 shows that YOLOv5-C3CA-SPPF(Ours) performs well on the test set with precision value of 97.2, Recall value of 96.7, F1 score of 96.8, and mAP value of 97.1, FPS value of 41. The improved algorithm outperformed the previous algorithms of SSD (74.3% mAP), Faster R-CNN (78.1% mAP), RetinaNet (95.4% mAP), YOLOv4 (95.7%mAP), YOLOv5 (96.0%mAP). Furthermore, it can be seen that YOLOv5-C3CA-SPPF (Ours) and RetinaNet have a difference of 1.7 percentage points. However, YOLOv5-C3CA-SPPF has almost twice the detection speed of RetinaNet. For the purpose of further showing the comparative experiment of the above YOLOv5 and YOLOv5-C3CA-SPPF methods intuitively, as for above results, the overall training validation loss curves of the two models are displayed in Fig. 5. It can be seen that the loss curves of the YOLOv5-C3CA-SPPF method has a faster learning and a notable decline in loss value than the loss curves of the YOLOv5 in the preliminary stage. In addition, the YOLOv5-C3CA-SPPF model could converge on a lower loss value during the middle and last phase of the training. The accuracy of the optical lens defects would arise along with failing the total loss value. From these results, it is clear that the C3CA module and the SPPF module of YOLOv5-C3CA-SPPF enhance the feature representation ability. Overall, YOLOv5-C3CA-SPPF(Ours) has excellent capability in terms of accuracy and inspection speed, which makes it more appropriate for testing complex samples in different situations.

Fig. 5. Validation Loss Curve.

Download Full Size | PDF

Table 1. Result comparison of different target detection algorithms on sample data set.

View Table | View all tables in this article

5.2 Qualitative detection algorithms comparison

The final outcome of different detection algorithms are compared qualitatively, as shown in Fig. 6. It can be found out that the proposed YOLOv5-C3CA-SPPF algorithm could detect defects in different scenarios of the optical lens (multiscale target of the small bubbles and bigger press-back defects, low-contrast target of the bubbles and the background). In particular, the proposed YOLOv5-C3CA-SPPF algorithm could detect nesting, stacking and intersection of defect samples (the nesting bubbles, stacking, and intersection of the impurity and bubbles ), which is superior to other algorithms including Faster R-CNN and YOLOv5. The primary cause could be that the C3CA module and the SPPF module reinforces the expression ability of the feature, which improves the detection capability.

Fig. 6. Comparison of multi-objective detection results among YOLOv5-C3EA-SPPF(Ours) YOLOv5 and Faster R-CNN on sample data set.

Download Full Size | PDF

5.3 Investigation of different network modules

This study uses the C3CA module and the SPPF module to improve the detection performance of YOLOv5. we will verify the effectiveness of each contribution through the two groups of ablation studies.

5.3.1 C3CA module for YOLOv5-C3CA-SPPF

An ablation experiment on C3CA model to verify the effectiveness is shown in Table 2. The detection of YOLOv5-SPPF based on the C3CA module obtains a precision of 97.3%, a recall of 96.7% and a mAP of 97.1%, which exceeds YOLOv5-SPPF without the C3CA module by 0.8 percentage points. The C3CA module positively influences the module performance by making use of the valid channel information and the accurate location information retained. General improvements demonstrated that the C3CA module is conductive to increase model learning ability, feature representation ability, and improving detection performance. In addition, in order to further show the effectiveness of the C3CA module, a result of training and validation are presented in Fig. 7. From defect test results of the YOLOv5-C3CA-SPPF method, it can be shown that the value for impurities was the highest, up to 99.4%. The value for press backs was up to 92.7%. The cold explosions detection was calculated as 96.8%, internal cracks and bubbles detection was calculated as 98.4%, and finally 97.1% for mAP (0.5) to all classes. The loss of the YOLOv5-C3CA-SPPF model was divided into the training group in the first three columns and the verification group in the second row. Both groups contain box loss, obj loss, and classification loss. After 100 training epochs, as shown in Fig. 8, the training group and the verification group generally follow a similar trend, in the initial phases of training. There is a significant decline of the loss value with faster learning, the module would converge in the middle and later phases of training. The accuracy of the model is continuously improved as the loss value decreases. In this study, the lowest total loss value of the epoch is used for identification on the test set.

Fig. 7. Precision-Recall curve

Download Full Size | PDF

Fig. 8. Key indicators according to the epochs of training

Download Full Size | PDF

Table 2. Ablation study of YOLOv5-SPPF without and with C3CA module.

View Table | View all tables in this article

5.3.2 Influence of the SPPF module on YOLOv5-C3CA-SPPF detection

SPPF module is a positive factor that would boost the YOLOv5-C3CA-SPPF detection capability, we also explore different influence for YOLOv5-C3CA based on SPPF and SPP module. The results are shown in Table 3. According to Table 3, YOLOv5-C3CA with the SPPF module achieved 97.2% precision, 96.5% recall, and 97.1% mAP (0.5), 21.7 (ms), which outperform the performance of YOLOv5-C3CA with the SPP module in accuracy and speed. This means that the SPPF module makes the network learn the multi-scale feature information. Furthermore, to intuitively show the effectiveness of the YOLOv5-C3CA based on the SPPF module, more examples of the detection of YOLOv5-C3CA-SPPF are presented in Fig. 9. It can be found out that the YOLOv5-C3CAF-SPPF could not only detect the little bubbles, but also detect the big press back. Especially, it could detect all the targets with nesting, stacking and intersection of defect samples. The main reason is that SPPF module pays more attention to augment the receptive field of the method. It could effectively enhance the learn ability of the network and the robustness. The well detecting performance of YOLOv5-C3CA-SPPF provides an effective way for automatic detection in optical lenses.

Fig. 9. Detection result for the YOLOv5-SPPF-C3EA model in the sample data set

Download Full Size | PDF

Table 3. Ablation study of YOLOv5-C3CA without and with SPPF module.

View Table | View all tables in this article

6. Conclusions

To solve the problems that traditional machine vision in the automatic detection of optical lens defects can not effectively detect multis-cale defects, low contrast defects, nesting, stacking, and intersection of defects samples, this study proposed a YOLOv5-C3CA-SPPF based approach, which has a hybrid module combining the coordinate attention and CSPNet (C3) to augment the extraction capability of defect features and improve the accuracy of the detection of defects of the optic lens. We created an optical lens defects dataset with more than 19000 samples of defect, which include five typical defect classes: bubbles, cold explosions, internal cracks, impurities, press backs. Experimental outcome demonstrated that the YOLOv5-C3CA-SPPF achieved a precision of 97.1%, mAP value of 97.1% and a FPS value of 41, which outperformed the original YOLOv5 model and other state-of-the-art methods (such as SSD, Faster R-CNN, RetinaNet). Ablation studies showed that the C3CA module booted the optical lens defect detection precision by 0.9%, which is slightly higher than the contribution of the SPPF module by 0.6%. Overall, our approach achieved a significant capability in speed and accuracy for optical lens defects detection, and can be used to automatic discriminate the low contrast with background, small defects and mixed complex defects in the different complex scenes.

In the future, the proposed approach will be further optimized on the network structure and network parameters and volume. In addition, few-shot learning and 3D visual features will be explored to further improve defect detection accuracy.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. S. Yin, H. Jia, G. Zhang, F. Chen, and K. Zhu, “Review of small aspheric glass lens molding technologies,” Front. Mech. Eng. 12(1), 66–76 (2017). [CrossRef]

2. X. Xie, G. Cheng, J. Wang, X. Yao, and J. Han, “Oriented r-cnn for object detection,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), pp. 3520–3529.

3. J. Li, X. Liang, S. Shen, T. Xu, J. Feng, and S. Yan, “Scale-aware fast r-cnn for pedestrian detection,” IEEE Trans. Multimedia 20(4), 985–996 (2017). [CrossRef]

4. B. Liu, W. Zhao, and Q. Sun, “Study of object detection based on faster r-cnn,” in 2017 Chinese Automation Congress (CAC), (IEEE, 2017), pp. 6233–6236.

5. P. Bharati and A. Pramanik, “Deep learning techniques—r-cnn to mask r-cnn: a survey,” Computational Intelligence in Pattern Recognition pp. 657–668 (2020).

6. Y. Zhang, Z. Wang, and Y. Mao, “Rpn prototype alignment for domain adaptive object detector,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), pp. 12425–12434.

7. S. Zhai, D. Shang, S. Wang, and S. Dong, “Df-ssd: An improved ssd object detection algorithm based on densenet and feature fusion,” IEEE access 8, 24344–24357 (2020). [CrossRef]

8. W. Lan, J. Dang, Y. Wang, and S. Wang, “Pedestrian detection based on yolo network model,” in 2018 IEEE international conference on mechatronics and automation (ICMA), (IEEE, 2018), pp. 1547–1551.

9. J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767 (2018). [CrossRef]

10. L. Zhao and S. Li, “Object detection algorithm based on improved yolov3,” Electronics 9(3), 537 (2020). [CrossRef]

11. A. Jothi, S. Jayaram, A. Dubey, et al., “Intra-ocular lens defect detection using generalized hough transform,” in 2017 6th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO), (IEEE, 2017), pp. 177–181.

12. J. Pan, N. Yan, L. Zhu, X. Zhang, and F. Fang, “Comprehensive defect-detection method for a small-sized curved optical lens,” Appl. Opt. 59(1), 234–243 (2020). [CrossRef]

13. M. Yang, J. Wu, and X. Niu, “Camera module lens blemish detection based on neural network interpretability,” Multimed. Tools Appl. 81(4), 5373–5388 (2022). [CrossRef]

14. C. Fan, L. Ma, L. Jian, and H. Jiang, “A real-time detection network for surface defects of mobile phone lens,” in Thirteenth International Conference on Graphics and Image Processing (ICGIP 2021), vol. 12083 (SPIE, 2022), pp. 224–232.

15. S.-H. Chen and C.-C. Tsai, “Smd led chips defect detection using a yolov3-dense model,” Adv. Eng. Inf. 47, 101255 (2021). [CrossRef]

16. J. H. Feng, H. Yuan, Y. Q. Hu, J. Lin, S. W. Liu, and X. Luo, “Research on deep learning method for rail surface defect detection,” IET Electrical Systems Trans. 10(4), 436–442 (2020). [CrossRef]

17. X. Pan, F. Yang, L. Gao, Z. Chen, B. Zhang, H. Fan, and J. Ren, “Building extraction from high-resolution aerial imagery using a generative adversarial network with spatial and channel attention mechanisms,” Remote Sens. 11(8), 917 (2019). [CrossRef]

18. X. Li, H. Xia, and L. Lu, “Eca-cbam: Classification of diabetic retinopathy: Classification of diabetic retinopathy by cross-combined attention mechanism,” in 2022 the 6th International Conference on Innovation in Artificial Intelligence (ICIAI), (2022), pp. 78–82.

19. S. Woo, J. Park, J.-Y. Lee, and I. S. Kweon, “Cbam: Convolutional block attention module,” in Proceedings of the European conference on computer vision (ECCV), (2018), pp. 3–19.

20. G. Han, M. Zhang, W. Wu, M. He, K. Liu, L. Qin, and X. Liu, “Improved u-net based insulator image segmentation method based on attention mechanism,” Energy Rep. 7, 210–217 (2021). [CrossRef]

21. L. Xiang, Z. Zhou, L. Miao, and Q. Chen, “Signal recognition method of x-ray pulsar based on cnn and attention module cbam,” in 2021 33rd Chinese Control and Decision Conference (CCDC), (IEEE, 2021), pp. 5436–5441.

22. K. Fan, P. Peng, H. Zhou, L. Wang, and Z. Guo, “Real-time high-performance laser welding defect detection by combining acgan-based data enhancement and multi-model fusion,” Sensors 21(21), 7304 (2021). [CrossRef]

23. A. Bochkovskiy, C.-Y. Wang, and H.-Y. M. Liao, “Yolov4: Optimal speed and accuracy of object detection,” arXiv preprint arXiv:2004.10934 (2020). [CrossRef]

24. D. Wang and D. He, “Channel pruned yolo v5s-based deep learning approach for rapid and accurate apple fruitlet detection before fruit thinning,” Biosyst. Eng. 210, 271–281 (2021). [CrossRef]

25. X. Wang, X. Wang, C. Hu, F. Dai, J. Xing, E. Wang, Z. Du, L. Wang, and W. Guo, “Study on the detection of defoliation effect of an improved yolov5x cotton,” Agriculture 12(10), 1583 (2022). [CrossRef]

26. Y. Zhao, M. Xiao, H. Lv, J. Luo, X. Wang, and D. Luo, “Research on scanning acoustic image defects detection of integrated circuits based on yolox,” in 2022 23rd International Conference on Electronic Packaging Technology (ICEPT), (IEEE, 2022), pp. 1–4.

27. S. Qiu, G. Wen, Z. Deng, J. Liu, and Y. Fan, “Accurate non-maximum suppression for object detection in high-resolution remote sensing images,” Remote Sens. Lett. 9(3), 237–246 (2018). [CrossRef]

28. S. Cheng, L. Wang, and A. Du, “Asymmetric coordinate attention spectral-spatial feature fusion network for hyperspectral image classification,” Sci. Rep. 11(1), 17408 (2021). [CrossRef]

29. M. Kasper-Eulaers, N. Hahn, S. Berger, T. Sebulonsen, Ø. Myrland, and P. E. Kummervold, “Detecting heavy goods vehicles in rest areas in winter conditions using yolov5,” Algorithms 14(4), 114 (2021). [CrossRef]

30. S. Liu, T. Cai, X. Tang, Y. Zhang, and C. Wang, “Visual recognition of traffic signs in natural scenes based on improved retinanet,” Entropy 24(1), 112 (2022). [CrossRef]

Algorithms	Backbone	P(%)	R(%)	F1(%)	mAP(%)	FPS/(frames-1)
SSD[10]	VGG	70.6	73.5	71.8	74.3	23
Faster R-CNN[8]	VGG	69.1	77.6	73.7	78.1	8
RetinaNet	ResNet-50	95.2	94.2	93.6	95.4	19
YOLOv4	CSPdarknet	95.8	94.5	95.6	95.7	56
YOLOv5	CSPdarknet	96.2	95.3	95.5	96.0	63
YOLOv5-C3CA-SPPF	CSPdarknet	97.2	96.7	96.8	97.1	41

Model	P(%)	R(%)	mAP(%)	Parameters	Weight/MB	Time/ms
YOLOv5-SPPF	96.4	94.1	96.3	7.03	13.8	14.1
YOLOv5-C3CA-SPPF	97.3	96.7	97.1	18.10	35.1	21.2

Model	P(%)	R(%)	mAP(%)	Parameters	Weight/MB	Time/ms
YOLOv5-C3CA-SPP	96.6	95.2	96.7	18.62	35.8	28.1
YOLOv5-C3CA-SPPF	97.2	96.5	97.1	18.10	35.1	21.7

Algorithms	Backbone	P(%)	R(%)	F1(%)	mAP(%)	FPS/(frames-1)
SSD[10]	VGG	70.6	73.5	71.8	74.3	23
Faster R-CNN[8]	VGG	69.1	77.6	73.7	78.1	8
RetinaNet	ResNet-50	95.2	94.2	93.6	95.4	19
YOLOv4	CSPdarknet	95.8	94.5	95.6	95.7	56
YOLOv5	CSPdarknet	96.2	95.3	95.5	96.0	63
YOLOv5-C3CA-SPPF	CSPdarknet	97.2	96.7	96.8	97.1	41

Model	P(%)	R(%)	mAP(%)	Parameters	Weight/MB	Time/ms
YOLOv5-SPPF	96.4	94.1	96.3	7.03	13.8	14.1
YOLOv5-C3CA-SPPF	97.3	96.7	97.1	18.10	35.1	21.2

A visual defect detection for optics lens based on the YOLOv5 -C3CA-SPPF network model

Abstract

1. Introduction

2. Related works

2.1 Vision-based defect detection

2.2 Attention mechanism

3. Materials and methods

3.1 Data acquisition

3.2 YOLOv5-C3CA-SPPF

3.2.1 Network structure of the original YOLOv5

3.2.2 Improved C3CA model based CSPNet

3.2.3 Improved spatial pyramid pooling faster module

3.2.4 Loss function

4. Experimental setup

5. Experimental results

5.1 Comparison of different object detection algorithms

5.2 Qualitative detection algorithms comparison

5.3 Investigation of different network modules

5.3.1 C3CA module for YOLOv5-C3CA-SPPF

5.3.2 Influence of the SPPF module on YOLOv5-C3CA-SPPF detection

6. Conclusions

Disclosures

Data availability

References

Data availability

Cited By

Figures (9)

Tables (3)

Equations (16)

Optics Express