Micro-optics assembly for fast axis collimation by means of convolutional neural network

Alexander Khachikyan; Giulia Pippione; Mehmet Inanc Sengünes; Roberto Paoletti; Moritz Seyfried

doi:10.1364/OE.433728

1. Introduction

High power diode laser systems have wide application both in industry and academic studies [1]. The main components of these systems are semiconductor laser diodes, driving electronics, and laser light governing micro-optics. The electronics enable the driving and operation of high-power laser diodes, which serve as a powerful light-energy source. The arrangement of micro-optical components, such as lenses, mirrors, gratings and dichroic filters, direct the laser beam and optimize its coupling to the optical fiber or other optical setups, enabling the use of high-power laser energy in a variety of applications.

One of the most important micro-optical components is the fast axis collimator (FAC) lens, which collimates the emitting beam along the fast axis of the propagating laser beam [2]. The fast axis is defined as the direction in which the beam has the largest angle of divergence within the range of ca. 25° to 60° [1,3]. Typically, such lenses have a relatively short back focal length (from tens to hundreds of micrometers), and therefore minor misalignments can significantly affect the collimation of the laser beam, increase coupling losses, and thus result in low performance of an assembled high-power laser diode system.

Robotic systems are state-of-the-art tools that align optical components to collimate the laser beam, perform a coupling loss assessment, and control the quality of high-power diode laser systems. The conventional alignment of the FAC is actively performed by precisely placing the lens in front of the facet of the emitter (Fig. 1) while controlling the beam shape, divergence, and center position as indicators of the well-collimated beam [4–5]. These laser beam features captured with the camera are used to organize the positive feedback loop in the programming of the aligner robot [6–8]. Compared to passive alignment, where the lens is placed in a previously calculated position, active alignment improves the accuracy of the micro-optics placement, but the alignment time is often one order of magnitude higher, as the high number of iterations needed.

Fig. 1. Schematic approach of beam collimation with FAC lens. The laser beam is captured with a camera and used to organize the feedback loop for the 6-axes aligner.

Download Full Size | PDF

In this work, we address this problem by developing a new high precision active approach without closed loop control that uses a convolutional neural network (CNN) to predict the final position of the FAC lens along the optical axis (Fig. 1). In this approach, the CNN receives the first image of the beam, so-called first light, which is taken when the lens is passively placed in the initial position to begin the full active alignment procedure. Here we are primarily interested in alignment along the most critical axis, since alignment along the other axes are usually less important, as they are often passively pre-aligned during lens handling and controlled during the lens gluing procedure.

We used the high-power laser diode systems manufactured by Prima Electro S.p.A that are equipped with an array of 976-nm laser diodes separately driven with a 6 A driving current during micro-optics assembly. Each emitter was operated in a QCW operation mode with a pulse period of 20 ms and a pulse width of 0.18 ms (i.e. 0.9% duty cycle). The optical power of a single emitter was in the range of 4.5 - 5 W. The FAC lenses used to collimate the emitters had a back focal length of 70 micrometers. The positioning of the lenses was performed using the specially programmed AL500-AA automatic micro-optics assembly robot manufactured by ficonTEC Service GmbH (Germany) [9]

2. Development of the convolutional network

2.1 Training, validation, and test datasets

The images of the laser beam for training the CNNs were collected with the robotic system using a specially developed process. In this process, the focus position of the lens was actively found using a near-infrared sensitive camera and set of additional lenses to focus the laser beam on the camera sensor. The image of the laser beam profile where the lens was in focus was stored in the folder labeled with the class “0” (Fig. 2, top left image). Then the lens was moved with the 6-axes aligner along the optical axis to 1 µm and the corresponding image of the beam was stored in the corresponding folder labeled “1”. The same process was repeated to collect a stack of the images for a single emitter until 20 different images corresponding to 20 different classes (“0” to “19”) were collected, each in steps of 1 µm. A representative stack of the laser beam images is presented in the Fig. 2. With the increase in the FAC lens offset from the focal position the beam rapidly diverges and visually disappears next to the collected class “15” (14 µm offset). The same collection process was repeated approximately 100 times, varying the FAC lenses and laser diodes. In this study, we used 10 individual laser diodes and 40 different FAC lenses. In total, we collected 2011 different images of the laser beam (Fig. 2) over two days (48 hours).

Fig. 2. Laser beam profiles for the train/validation dataset. The images represent the beam profile with the FAC lenses being placed in the focus (top left corner) and gradually moved with the 1 µm step, up to 19 µm offset.

Download Full Size | PDF

The collected images were divided into two parts, as shown in Table S1 (in Supplement 1): a training set (1671 images, ca. 83% of the total dataset) and a validation set (340 images, ca. 17% of the total dataset). The CNNs generalization capability was evaluated using a test dataset, collected during an actual build of working modules in a Factory Acceptance Test (FAT). The FAT evaluates the performance of a programmed robotic system prior to its deployment in a high-volume manufacturing environment. The FAT-test set consisted of 202 first light images, with each image corresponding to a unique emitter collimated with a unique FAC lens. During the laser beam collimation procedure for each emitter the starting position, where the first light image was taken, was recorded in the database. The final position of the FAC lens was also recorded, thus enabling the calculation of lens movements during the complete active alignment process. We hypothesized that a well-generalizing trained convolutional neural network could predict these movements with sub-micron accuracy, previously achieved only with an active alignment strategy.

2.2 Architecture of the CNNs

We employed the concept of transfer learning [10], where a convolutional neural network, pre-trained with a large-scale image data, is used as starting block to create new models feasible for a targeted application. The transfer learning paradigm is conceptualized around the design of new models where a significant part of the learnable parameters is preserved intact. Three different CNN architectures of the models were exploited: AlexNet, VGG19_bn, and ResNet152 models [11–13]. Our goal was to develop a classifier that returns a vector of probabilities (${p_i}$) over predicted output classes (i.e., “0” – “19”) for the input image of the laser beam. Since the output classes correspond to the distance to the focal position of the lens (${z_f}$), we have used Eq. (1) to calculate lens offset based on the input image. This equation yields the distance the FAC lens must be moved by 6-axes aligner along the optical axis z based on the vector of predictions for the input image.

(1)$${z_f}\; = \; \mathop \sum \nolimits_{i\; = \; 1}^n {p_i}\; \times \; {z_i}, $$

$n$ is the total number of the output classes and is equal to 20 in our study. ${z_i}$ is the numerical representation of i class, i.e., ${z_1} = 0$ µm and ${z_{20}} = 19$ µm. While preserving model complexity, the additional usage of this calculation reduces potential prediction error, as it converts the discrete predictions from single images to continuous ones. A simple example is the case where our image would be simultaneously attributed to class “3” and class “4” with 40% and 60% probabilities. Employing the formula gives a 3.6 µm offset, in comparison to the 4 µm offset predicted with a discrete strategy.

The model based on the AlexNet convolutional neural network, which was the baseline model to test other models against, was found to have the best scores with the several alternations. The first fully connected layer with 4096 nodes was connected to an additional fully connected layer of 512 nodes with a rectified linear unit (ReLU) activation function, and followed by dropout (with 0.4 probability) to prevent potential overfitting. The last layer had 20 output features with a log softmax output function (Fig. 3).

Fig. 3. Architecture of the AlexNet-based convolutional network. (This and other illustrations of the architecture of the convolutional networks were drawn with LaTeX PlotNeuralNet package; https://doi.org/10.5281/zenodo.2526396).

Download Full Size | PDF

Similar alterations of the initial architecture of the VGG19_batch normalization (bn) model, with an additional change in intermediate fully connected layer from 512 to 128 nodes, yielded its best performance. The complete illustration of the constructed VGG19_bn model is presented in the Fig. 4. Here, the CNN consists of the series of convolutional blocks (i.e. Conv1 to Conv5) with weights and biases learned on the large datasets (as we exploited the transfer learning paradigm) to extract features from the original input image are passed along the depth of the network and eventually flattened with fully connected layers towards final multiclass prediction. We referred to the paper by Zeiler et al., (2013) [14] to build an intuition behind the learning process for the CNNs.

Fig. 4. Architecture of the VGG19_bn-based convolutional network.

Download Full Size | PDF

In total, the AlexNet-based model has about $60 \times $ 10⁶ parameters, while approximately 2.$1 \times $ 10⁶ were learned during the training process (Table 1). Similarly, the more complex VGG19_bn-based model had fewer number of learnable parameters (0.5${\times} $ 10⁶), while the number of total parameters significantly exceeded the AlexNet-based model with a total number of about 1400${\times} $ 10⁶.

Table 1. Complexity of the Models

View Table | View all tables in this article

Although the architecture of the convolutional ResNet152-based model (Fig. 5) significantly differs from the two previously described models in that the basic building block is a residual cell, the architecture alternations that achieved the best performance were very similar. We introduced the following modifications: the last fully connected layer with 2048 nodes was connected to the layer with 512 nodes and ReLU activation function and was followed by dropout with 0.4 probability. Similar to the other two models, the final fully connected layer had 20 output features with the log softmax function. In terms of the learnable parameters this model exceeded the other two models by the factor of two (Table 1).

Fig. 5. Architecture of the ResNet152-based convolutional network.

Download Full Size | PDF

2.3 Training of the CNNs

Training of the models was performed using the Pytorch environment on a Nvidia Quadro P4200 GPU (CUDA platform). The train and validation datasets contained raw images of the laser beam (8-bit grayscale images, with a resolution of 2464 × 2056 pixels) collected with an aligner. The images were loaded as RGB images to fulfill the necessary requirements to use pretrained models. Loaded images were scaled to 224 × 224 pixels, and normalized using vector of means (0.485, 0.456, 0.406) and vector of standard deviations (0.229, 0.224, 0.225) for R, G, and B respectively. The training and validation sets were loaded in batches of three images to reduce the training time. We anticipate that the possibility to use images with a significantly reduced number of pixels (224 versus 2464) is directly linked to the potential reduction of the frame grabber resolution, which would lead to a decrease in the time spent per image during the collection procedure.

The pre-trained networks were loaded on the GPU, and their architectures were updated as previously described (Figs. 3–5). We used an adaptive moment estimation optimizer (Adam) with an initial learning rate of 10⁻⁴. The number of the training epochs as well as the network architecture parameters were addressed as hyperparameters. During the training procedure, the accuracy was calculated at each step for both train and validation sets. The most successful layouts for the model’s architectures were described before, while the number of the training epochs were: 35 for the AlexNet-based model; 14 for VGG19_bn-based; and 75 epochs for the ResNet152-based model.

3. Performance evaluation of the trained CNNs

3.1 Models performance on the training and validations datasets

The predicted values of the training and validation data were evaluated against the expected classes and could be indicative of the lens positioning accuracy during data acquisition. The metric used for the evaluation was root-mean-squared error (RMSE). The best error on the training set of 0.98 µm was measured for the VGG19_bn-based model (Table 2), followed by AlexNet- and ResNet152-based models with 1.98 µm and 2.62 µm, respectively.

Table 2. Root-mean-squared errors (RMSE) on training, validation, and test datasets

View Table | View all tables in this article

Surprisingly, the best validation error of 1.18 µm was measured for the AlexNet-based model. The other models showed higher errors of 1.98 µm (VGG19_bn-based) and 2.71 µm (ResNet152-based). The error rates were affected by accidental cross-misclassification of the classes “2” and “19” (Fig. 6; visualization for the VGG19_bn-based model), where images were attributed to both classes with relatively high probabilities. This caused the predicted movement for the images from the class “2” to be greater than the measured one and vice versa for the images of the class “19”.

Fig. 6. Prediction evaluation on the validation (a) and train (b) data for the VGG19_bn-based model.

Download Full Size | PDF

3.2 Performance of the models within quasi-manufacturing environment

The most critical step was the evaluation of the models using the images collected during the actual manufacturing of parts that passed strict industrial requirements. This scenario is considered to represent the behavior of the model in a high-volume manufacturing environment. This evaluation was performed using a test data set of a size ca. 10% of the complete dataset used to develop the models. We compared predicted movements to the actual movements measured during the FAT (here, the active alignment strategy was used). We found strong linear relationships between measured and predicted movements best described by the following equations:

for AlexNet-based model $y = 0.84 \times x + 0.95\; ,$ R² = 0.92 (Fig. 7(a));
for VGG19_bn-based model $y = 0.93 \times x + 0.55\; ,$ R² = 0.87 (Fig. 7(b));
and for VGG19_bn-based model $y = 0.87 \times x + 1.0\; ,$ R² = 0.87 (Fig. 7(c)).

The closest dependency to the expected 1:1 relationship (i.e. $y = x$ equation) was observed for the VGG19_bn-based model. Our results show that the best error of 0.89 µm (Table 2) was achieved with the VGG19_bn-based model, while other models yielded errors of 0.90 µm (AlexNet-based) and 0.91 µm (ResNet152-based model). Interestingly, when the predictions were corrected using the previously described linear equations, the RMSE for the AlexNet-based model dropped to 0.72 µm, while for other models it slightly increased.

Fig. 7. Predictions evaluation for the trained models: AlexNet-based model (a), VGG19_bn-based model (b), and ResNet152-based model (c). Linear regression analysis between measured movement values during the real parts assembly versus the measurement values predicted by the models using the “first light” image.

Download Full Size | PDF

The prediction errors were calculated as the difference between predicted and measured motions. We found that the distributions of the prediction errors followed a bell curve (Fig. 8) centred around the 0 µm position for the VGG19_bn- and ResNet152-based models (Figs. 8(b),8(c)), while the prediction error distribution for the AlexNet-based model was slightly shifted (Fig. 8(a), raw error). The prediction errors, calculated as the difference between measured movement, and the predicted movement with respect to the established linear equations, (Fig. 8, linear fit error) highlight the previously established fact that improvement using a best fit relationship is possible only for the AlexNet-based model.

Fig. 8. Histograms with kernel density estimation for the error’s distributions for the trained models: AlexNet-based model (a), VGG19_bn-based model (b), and ResNet152-based model (c).

Download Full Size | PDF

Overall, the VGG19_bn-based model performed better not only in terms of the used metric (i.e. RMSE), but also due to the absence of outliers. The last fact is very critical considering the ultimate goal of this study – to develop a model that can be successfully implemented to substitute active steps in the process and is robust for high-volume manufacturing environment. We believe the errors associated with the presence of outliers would lead to a decrease in machine robustness, resulting in collisions within long part builds. Further, the core principle of keeping the machine processes as simple and clear as possible, implemented in the development of rather uncomplicated CNNs, would be corrupted. In addition, it could result in an inability to reach the necessary machine yield of the working modules and subsequently lower the overall throughput of the system.

Our study goes in line with the current tendency to exploit deep learning algorithms as efficient and versatile tools in the fields of applied optics and robotics. To our knowledge, similar tools, namely Back-Propagation Artificial Neural Network (BP-ANN) [15,16] and CNN [17] have been suggested to apply for the FAC lens alignment. Yu and his colleagues developed a rather shallow neural network (two hidden layers) that takes laser beam parameters (peak position, centroid position, beam width, and asymmetry) as inputs and predicts the lens offsets from the optimal position. The crucial factor in that work that might limit its implementation for the actual robotic system is the difficulty and complexity in constructing a sufficient dataset to train the model. Especially considering that a intrinsically limited Gaussian beam ray-equivalent model was exploited to estimate the input parameters, based on the variation of the output parameters (i.e. tilt, defocus, and decenter). Similar issues are the case for the model developed by Hoeren and his colleagues, where the training dataset was constructed combining the geometric optics and Gaussian-beam approaches to simulate the beam image and corresponding lens misalignments.

4. Conclusion and outlook

Our approach to perform the collimation of high-power diode laser systems can significantly reduce the assembly time. We have compared the time needed for conventional alignment of the FAC lens and the time for CNN-based alignment scenarios, as shown in Fig. S1 (in Supplement 1). For the latter we determined the time needed to conduct predictions on the beam images of different dimensions (image dimensions can be pre-set for each camera unit in the robotic aligner). We estimate a significant time reduction from tens of seconds for the conventional active FAC lens assembly to hundreds of milliseconds for both the low and high resolution images (e.g. 224 × 224 and 2464 × 2056). Here we call for caution, as the final time for the assembly process using convolutional networks would also depend on the accuracy of the coarse alignment of the FAC lens during the handling procedure, and hence the time necessary to find a first light image – the single intermediate alignment step that cannot be skipped.

It is important to note that our implementation is currently limited to one degree of freedom, i.e. alignment along the optical axis. Alignment for the other degrees of freedom is often pre-corrected and compensated during the lens attachment procedure, where the epoxy for opto-electronics assembly is cured by UV. The alignment for multiple degrees of freedom was not investigated within this study, so further studies would be necessary to address this question. However, we think that the multi-axis alignment problem can be addressed by employing multiple convolutional networks, organized hierarchically as a tree (i.e. Tree-CNN) [18–19], where our model would be a starting root node. Other leaf CNNs, architecturally similar to the root one, would be focused on the predictions of the misalignments of other less critical axes.

Compared to other machine learning based methods for FAC assembly, the convolutional network training relies on data that can be easily collected in the manufacturing environment and therefore ensure the robustness of the approach and fulfil deployment requirements. This would allow easy monitoring of the performance of the developed model in the industrial applications. The ability to rapidly acquire new training data with the described semi-automated acquisition technique would allow the collection of multiple laser beam images at different locations, and therefore correction for misalignment errors as described by Mirigaldi et al., (2021) [20].

The change in the laser/lens combination, while a relatively very rare process, requires additional reprogramming of the setup. The semi-automated acquisition of the new training set within two working days would thus also lead to the faster deployment of an appropriate model. Lastly, we anticipate that a similar method can be applied to the alignment of other micro-optical components, leading to even more significant increases in the number of units manufactured by the machine per hour.

Funding

European Union's Horizon 2020 research and innovation programme, project IQONIC (Grant Agreement no. 820677).

Acknowledgments

The authors thank Torsten Vahrenkamp and Matthias Trinker for their immense support during work on this project. The authors acknowledge Dr. Christoph von Kopylow and Dr. Friedrich Bachmann for useful discussions during preparation of the manuscript.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

Supplemental document

See Supplement 1 for supporting content.

References

1. F. Bachmann, P. Loosen, and R. Poprawe, eds., High Power Diode Lasers: Technology and Applications, (Springer Series in Optical Sciences128, 2007). ISBN 978-0-387-34729-5

2. V. Sturm, H. G. Treusch, and P. Loosen, “Cylindrical microlenses for collimating high-power diode lasers,” Proc. SPIE 3097, 717–726 (1997). [CrossRef]

3. H. Sun, “A Practical Guide to Handling Laser Diode Beams,” (Springer Series in Physics, 2015). ISBN 978-94-017-9783-2

4. T. Westphalen, S. Hengesbach, C. Holly, M. Traub, and D. Hoffmann, “Automated alignment of fast-axis collimator lenses for high-power diode laser bars,” High-Power Diode Laser Technol. Appl. XII, Proc. SPIE8965, 89650V (2014).

5. C. Brecher, N. Pyschny, S. Haag, and V. Guerrero Lule, “Automated alignment of optical components for high-power diode lasers,” High-Power Diode Laser Technol. Appl. X, Proc. SPIE8241, 82410D (2012).

6. J. Miesner, A. Timmermann, J. Meinschien, B. Neumann, S. Wright, T. Tekin, H. Schröder, T. Westphalen, and F. Frischkorn, “Automated assembly of fast-axis collimation (FAC) lenses for diode laser bar modules,” High-Power Diode Laser Technol. Appl. VII7198, 71980G (2009).

7. J. Pierer, M. Lützelschwab, S. Grossmann, G. Spinola Durante, C. Bosshard, B. Valk, R. Brunner, R. Bättig, and N. Lichtenstein, “Automated assembly processes of high power single emitter diode lasers for 100W in 105 μm / NA 0.15 fiber module,” High-Power Diode Laser Technol. Appl. IX, Proc. SPIE7918, 79180I (2011).

8. Y. Yan, Y. Zheng, J. Duan, and Z. Huang, “Influence of positioning errors of the laser collimator on the beam shape and coupling efficiency,” Opt. Fiber Technol. 58, 102301 (2020). [CrossRef]

9. FiconTEC, “Assembly line A800/A1200/A1600,” https://www.ficontec.com/wp-content/uploads/pdf/en/ASSEMBLYLINE-1906-web.pdf

10. H. C. Shin, H. R. Roth, M. Gao, L. Lu, Z. Xu, I. Nogues, J. Yao, D. Mollura, and R. M. Summers, “Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning,” IEEE Trans. Med. Imaging 35(5), 1285–1298 (2016). [CrossRef]

11. A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in Neural Information Processing Systems25, 1097–1105 (2012).

12. K. Simonyan and A. Zisserman, “Very deep convolutional networks for large-scale image recognition,” arXiv preprint arXiv:1409.1556, (2014).

13. K. He, “Deep residual learning for image recognition,” Proceedings of the IEEE conference on computer vision and pattern recognition, (2016), pp. 770–778.

14. M. D. Zeiler and R. Fergus, “Visualizing and Understanding Convolutional Networks,” in Conf. Proc. 13th ECCV, (2012), pp. 818–833.

15. H. Yu, G. Rossi, A. Braglia, and G. Perrone, “Application of Gaussian beam ray-equivalent model and back-propagation artificial neural network in laser diode fast axis collimator assembly,” Appl. Opt. 55(23), 6530 (2016). [CrossRef]

16. H. Yu, G. Rossi, A. Braglia, and G. Perrone, “Artificial neural network assisted laser chip collimator assembly and impact on multi-emitter module beam parameter product,” Components Packag. Laser Syst. III, Proc. SPIE10085, 1008508 (2017).

17. M. Hoeren, D. Zontar, A. Tavakolian, M. Berger, S. Ehret, T. Mussagaliyev, and C. Brecher, “Performance comparison between model-based and machine learning approaches for the automated active alignment of FAC-lenses,” High-Power Diode Laser Technology XVIII11262, 1126209 (2020).

18. D. Roy, P. Panda, and K. Roy, “Tree-CNN: A hierarchical Deep Convolutional Neural Network for incremental learning,” Neural Networks 121, 148–160 (2020). [CrossRef]

19. T. Xiao, J. Zhang, K. Yang, Y. Peng, and Z. Zhang, “Error-driven incremental learning in deep convolutional neural network for large-scale image classification,” Mm 2014 - Proc. ACM Conf. Multimed. d, (2014), pp. 177–186.

20. A. Mirigaldi, M. Carbone, and G. Perrone, “Non-uniform adaptive angular spectrum method and its application to neural network assisted coherent beam combining,” Opt. Express 29(9), 13269 (2021). [CrossRef]

Model	Total Number of Parameters	Number of learnable Parameters
AlexNet-based	59 111 764	2 107 924
VGG19_bn-based	140 108 244	526 996
ResNet152-based	59 203 156	1 059 348

Model	Total Number of Parameters	Number of learnable Parameters
AlexNet-based	59 111 764	2 107 924
VGG19_bn-based	140 108 244	526 996
ResNet152-based	59 203 156	1 059 348

Micro-optics assembly for fast axis collimation by means of convolutional neural network

Abstract

1. Introduction

2. Development of the convolutional network

2.1 Training, validation, and test datasets

2.2 Architecture of the CNNs

2.3 Training of the CNNs

3. Performance evaluation of the trained CNNs

3.1 Models performance on the training and validations datasets

3.2 Performance of the models within quasi-manufacturing environment

4. Conclusion and outlook

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (8)

Tables (2)

Equations (1)

Optics Express

Model	Train Error, µm	Validation Error, µm	Test Error, µm
AlexNet-based	1.98	1.18	0.90
VGG19_bn-based	0.98	1.98	0.89
ResNet152-based	2.62	2.71	0.91