## Abstract

We propose a fast and accurate autofocus algorithm using Gaussian standard deviation and gradient-based binning. Rather than iteratively searching for the optimal focus using an optimization process, the proposed algorithm directly calculates the mean of the Gaussian shaped focus measure (FM) curve to find the optimal focus location and uses the FM curve standard deviation to adapt the motion step size. The calculation only requires 3-4 defocused images to identify the center location of the FM curve. Furthermore, by assigning motion step sizes based on the FM curve standard deviation, the magnitude of the motion step is adaptively controlled according to the defocused measure, thus avoiding overshoot and unneeded image processing. Our experiment verified the proposed method is faster than the state-of-the-art Adaptive Hill-Climbing (AHC) and offers satisfactory accuracy as measured by root-mean-square error. The proposed method requires 80% fewer images for focusing compared to the AHC method. Moreover, due to this significant reduction in image processing, the proposed method reduces autofocus time to completion by 22% compared to the AHC method. Similar performance of the proposed method was observed in both well-lit and low-lighting conditions.

© 2021 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

Production quality control relies heavily on the inline inspection of manufacturing processes, e.g., the real-time monitoring and metrology for the microcontact printing of flexible electronics [1]. The invention of many high-speed and high-accuracy image acquisition and image processing techniques facilitates the measurement of electronic pattern production, however, many of these techniques are not sufficient for industrial inspection due to their lack of real-time application. If image focusing is required for high-speed inline inspection, these image techniques can create manufacturing bottlenecks due to their slower autofocus (AF) methods. Similarly, fast and continuous image-based AF is also critical for many other scenarios, such as the imaging and metrology of biological samples [2,3] or scenarios where non-image-based AF methods pose risks to damaging heat or photo-sensitive biological samples [4].

Image-based AF techniques seek to adjust the distance between the camera sensor plane and the lens plane such that the region of interest (ROI) has the maximum possible sharpness. The state-of-the-art, image-based AF techniques have to acquire a large number of both defocused and focused images to converge to an optimal focus location [5] or train a focus learning model to estimate the optimal focus location [6–9]. These techniques are either too slow for real-time imaging applications, are extremely computationally expensive, or require training a model that is specific to the imaging configuration. There exists an unmet need for a high performance autofocus technology that requires fewer images as input and reduces the computational complexity needed for fast image-processing time.

In this paper, we propose a fast and accurate autofocus algorithm using Gaussian standard deviation and gradient-based binning. Instead of iteratively searching for the optimal focus using an optimization process, the proposed algorithm directly calculates the mean of the Gaussian shaped FM curve to find the optimal focus location and uses the FM curve standard deviation to adapt the motion step size. The calculation only requires 3-4 defocused images to identify the center location of the FM curve. Furthermore, based on scale-space theory [10], the standard deviation of the curve is used as the base scale for adapting the motion step size. By assigning adaptive step sizes to different bins based on initial focus measurement, the magnitude of the motion step is determined via the defocused measure, thus resulting in a highly efficient method for image-based AF.

In section 2, we review the state-of-the-art AF techniques. In section 3, we elaborate on our proposed AF method using Gaussian standard deviation and gradient based binning. In section 4, we evaluate our proposed method by comparing its AF time to completion, root-mean-square error (RMSE), and mean required FM data to that of a fast and accurate Adaptive Hill-Climbing method (AHC) as well as a baseline Global Search method (GS). We summarize our conclusion and future work in section 5.

## 2. State of the art

Classic AF methods in literature can be divided into two categories, active and passive. The active AF methods use additional hardware, e.g., laser and motor, to measure the distance between camera and object to maintain a stable focus distance [11]. This method can result in steady AF control via the information from the ground truth focus measurement but increases the complexity and the cost of the system due to the necessary use of a laser. Wei Yang et al. developed the hand-hold fiber-optic Raman probe with both active and passive autofocus functions [12]. The introduction of active AF methods aims to remove limitations surrounding focusing accuracy that can arise with varying image content (e.g., low-contrast imaging samples or low-light conditions) [13]. However, active methods pose their own limitations. In some scenarios, active AF methods can only measure the distance between a reflective surface and the camera. This can create issues when imaging non-reflective targets where the distance between region of interest and camera cannot be measured. Additionally, in some scenarios, active AF methods may not be applicable, e.g., when imaging photo-sensitive slides that are not capable of being irradiated by a laser beam [4].

Alternatively, rather than using hardware, passive AF methods use a search algorithm to find the best focus position. Passive AF, in essence, is simply stated as the autonomous navigation from a defocused image to a focused one. To establish whether an image is focused or defocused, focus measure functions (FMFs) are used. These FMFs assign a scalar value, called focus measure (FM), to an ROI using a variety of methods. A focus measure curve simply refers to the two-dimensional relationship between the variable distances between sensor and lens planes and the respective FM scores. Passive sensing for AF includes phase-based methods and contrast-based methods. The selection of phase or contrast-based AF depends on the method of image acquisition. If the images are captured by a camera equipped with a phase detector, or by a light field camera, also known as plenoptic camera [5], phase-based methods could be used. Guillaume Chataignier et al. studied the quad-pixel sensor technology that could use phase-based methods as well [14]. Otherwise, contrast-based methods are used for AF control [15].

In contrast-based AF methods, an FM is extracted from the ROI inside an image captured by the camera. The objective of a contrast-based AF procedure is to maximize this FM. Typical FMFs used for AF control vary widely, ranging from simple gradient methods to wavelet and Laplacian based methods [16]. By principle, these methods operate under the focus-defocus contrast assumption. This assumption states that focused images inherently have more detail than defocused images, meaning the relative degree of focus can be quantified by the level of detail present in an image [17]. Many FMFs, such as the Gaussian Derivative (GDER) method chosen for this paper, operate under this basic contrast assumption [18]. Many algorithms were put forward to fulfill the contrast-based AF task, such as global searching, Fibonacci searching [3,19], rule-based searching [20], curve fitting [21], prediction model [22], combined DFF and DFD methods [1], and structure tensor-based autofocusing algorithms [23], etc. Global searching can make sure the peak FM will not be missed in the AF control process but is limited by its long searching time. Other rule-based searching methods can speed up the searching process but can sometimes converge to local maxima instead of global maxima. Curve fitting algorithms are often more accurate than other searching methods however require large quantities of FM data and long AF times.

One approach to reducing AF time involves the application of various machine learning algorithms, including supervised learning [24,2,6,9] and reinforcement learning [7,8]. A convolution neural network (CNN) was recently used for both types of learning algorithms. Wang et al. [24] used a deep learning pipeline structure with a global step estimator and a focus discriminator for AF control. Both the global step estimator and the focus discriminator shared a CNN structure and by using this supervised learning method, AF control was achieved with far fewer search steps than rule-based and Fibonacci searching methods. Wei et al. [2] used a CNN consisting of two convolution blocks followed by two fully connected layers for the time-lapse microscopy of growing and moving cell samples. Similarly, Shajkofci et al. [6] developed a DeepFocus CNN-based FMF used for microscopy AF control where they provide an FM curve whose shape, up to axial scaling, is invariant to the sample. Using this novel FMF, AF could be achieved in far fewer iterations when compared to standard FMFs. Furthermore, Herrmann et al. [9] developed a supervised learning method for both contrast-based AF and phase-based AF where MobileNetV2 [25] was used for their portable device. In Table 1, we summarize the search algorithms used for passive autofocus.

## 3. Theory

#### 3.1 Gaussian model for focus measure

Scale-space theory tells us that the relative degree of blur due to image defocus follows the Gaussian model [10]. It also leads us to the use of the first-order Gaussian derivative to quantify the focus measure. In this case, the quantitative representation of FM calculated by GDER will also follow the same Gaussian model. Although various gradient and Laplacian based FMFs operate faster than GDER [26], these methods do not take advantage of the physical relationship between defocused and focused images defined by true Gaussian blur. The FM curve can be estimated by sampling the FM at different levels of focus. For the purpose of proposing a fast and accurate AF algorithm, the GDER method for FM evaluation is chosen due to its superior ability to consistently return an FM curve resembling the Gaussian model with satisfactory precision, recall, and absolute error [27]. The GDER method used to calculate the FM scores is defined as

*σ*is a scaling constant, $f({x,y} )$ is the image gray value at pixel coordinates $({x,y} )$, ${G_x}({x,y,\sigma } )$ and ${G_y}({x,y,\sigma } )$ are respectively the first-order Gaussian derivatives in the

*x-*and

*y-*directions at scale

*σ*, and

*NM*is the total number of pixels in the ROI [18]. The Gaussian model describing the FM curve of a single planar ROI calculated by the GDER FMF is defined as where

*A*, ${z_n}$, ${z_\mu }$, and σ is a scaling constant, the distance between the sensor plane and lens plane, the distance between the sensor plane and lens plane with the maximum FM score, and the standard deviation of the FM curve given specified ROI content, respectively. Figure 1 visually displays the variable parameters ${z_n}$ and ${z_\mu }$ from Eq. (2) using a focused and defocused condition.

A typical example of AF utilizing the Gaussian model involves both a fixed sensor and lens plane. For these situations, a focused condition is accomplished by moving the ROI plane (i.e., the object) either closer to or farther away from the sensor and lens planes (i.e., the camera). This situation can be easily adapted to the alternative application of a fixed ROI plane and variable focus lens. Note that ROIs of an object comprised of targets at two or multiple planar positions can be deduced as multimodal Gaussian curves; the multimodal curve being the sum of each independent Gaussian distribution where the number of planes equates to the number of local maxima. With the sole intention of proposing a novel AF algorithm, single planar targets are chosen to eliminate the multimodality of the FM curve to further resemble imaging targets used in machine vision-based sensing, metrology, pattern analysis, and feedback control, as these industries demonstrate the greatest need for fast and accurate AF.

#### 3.2 Gaussian regression for autofocus

Given limited data, the FM curve of Eq. (2) can be approximated via Gaussian regression. Such a regression serves as an excellent way to approximate an FM model and find an optimal focus location [21], but issues surrounding Gaussian regression for AF are twofold: large quantities of data are required to regress a model typically acquired through a slow global search, and if a global search is not used, FM data needs to be local relative to ${z_\mu }$. Non-${z_\mu }$-local FM data is seldom used because the necessary step size between points to gather information that accurately describes the curve remains unknown during AF. Furthermore, regression models aim to fine-tune all parameters to minimize a specified loss function, which is both computationally expensive and unnecessary as ${z_\mu }$ is the only parameter describing the location of the optimal focus. Hence, using Gaussian regression-based methods for all three parameters of Eq. (2) for an optimal focus search is unreasonable for fast and accurate AF control.

#### 3.3 Direct Gaussian mean calculation

As previously stated, achieving complete AF control requires knowing the location of ${z_\mu }$ at the instantaneous level of defocus. Mathematically, the three parameters *A*, ${z_\mu },\; $and $\sigma $, in Eq. (2) can be solved given any three FM data on the curve so long as they are non-linearly correlated. From Eq. (2), the location of the optimal focus, ${z_\mu }$ can be derived by

The accuracy of the calculated ${z_\mu }$ value compared to that of a fully regressed model is dependent on many factors. Given a perfect Gaussian, any combination of valid FM data will return the identical value of ${z_\mu }\; $using both methods; however, perfect Gaussian FM data for Eq. (3) intended for robust AF control rarely exists. Theoretically, three local FM data with step sizes of one could calculate the correct value of ${z_\mu }$, but due to noise and error from imaging system illumination, and motion localization, this method is infeasible. Assessing FM data locally for all AF methods has significant limits, especially with predictive models using highly defocused FM data where the signal to noise ratio poses significant risks to the accuracy of the Eq. (3) calculation. For this reason, a novel method to intelligently select FM data from the Gaussian FM curve is devised in the subsequent section.

#### 3.4 Step sizes based on focus measure curve standard deviation

Achieving an accurate value of ${z_\mu }$ using Eq. (3) is highly dependent on obtaining FM data that distinctly follows Gaussian curvature (i.e., non-linear, non-asymptotic, minimal influence from noise.) Theoretically, three data unilateral to ${z_\mu }$ could return an accurate calculation, however small variations in these data (i.e., system noise) can have drastic effects on the accuracy of Eq. (3), even if the FM data are non-local. To virtually remove this limitation from the direct calculation, it is required that the three FM data are located bilateral to ${z_\mu }$. Introducing the simple condition of ${f_{n\; }} < {f_{n - 1}}$, provided the data is Gaussian and steps are made in the direction of ${z_\mu }$, guarantees that ${f_{n - 2\; }}$, $\; {f_{n - 1\; }}$, and ${f_{n\; }}$ are not linearly correlated and are located bilateral to ${z_\mu }$. AF situations that satisfy this condition do so because ${z_\mu }$ resides in the range bounded by ${z_{n\; }}$ and ${z_{n - 2\; }}$. Using this condition for limiting the influence of system noise is essential if the proposed method is to be robust enough for fast and accurate AF control. Figure 2 illustrates the method for acquiring bilateral FM data necessary to use Eq. (3) for Gaussian model AF control.

Three FM data that distinctly follow Gaussian curvature can accurately calculate ${z_\mu }$ using Eq. (3); however, knowing the necessary step size between three FM data to achieve this remains virtually unknown. As the dispersion of the Gaussian FM curve is governed by its standard deviation, we introduce a step size based on the standard deviation of the FM curve. For various defocus conditions, we will get different Gaussian FM scores. This is a direct reflection of scale-space theory and the resultant point spread at different scales quantified by the GDER FMF. Therefore, it is appropriate for us to use the standard deviation as the base scale to adapt the step size at different defocus regions of the FM curve. Zhenbo Ren et al. corroborated this finding in their own work through introducing Gaussian blur to an image by increasing its content standard deviation [23].

Introducing an adaptive step size based on the fundamental dispersion of the FM curve can ensure that the three FM data meet the previously mentioned conditions and are obtained without significant overshoot, a drawback to many AF control algorithms today. For instance, in Fig. 2, given an initial location ${z_{n - 2}}$ with a distance of 1σ to ${z_\mu }$, a step size of .75σ would guarantee that ${f_{n\; }} < {f_{n - 1}}$ using only three FM data. However, the same .75σ step size at a different ${z_{n - 2}}$ position may not be as effective for fast AF as either more FM data or a different step size would be required to satisfy ${f_{n\; }} < {f_{n - 1}}$ without significant overshoot. This method for adapting the step size as either multiples or fractions of the standard deviation of the FM curve performs optimally, so long as the proper σ step size is chosen.

The σ of the ROI modeled by the GDER FMF which will be used as the base scale for an adaptive step size can be calculated using any three FM data, provided the data meet the same conditions necessary for Eq. (3). The σ of the GDER FM curve is defined as

#### 3.5 Gaussian derivative binning

According to scale-space theory, the FM curve, measured by the GDER FMF, quantifies the change of scale of an ROI as a function of the distance between the sensor and lens planes for any camera and lens. So long as the position of the ROI remains unchanged, the distance between sensor and lens plane (${z_n}$) is the only variable that can change the image scale perceived by the camera sensor. If the planar position of the ROI changes, there are now *two* independent variables affecting the perceived scale of the image, the previous distance between sensor and lens planes (${z_n}$) and the new distance between sensor and lens planes corresponding to the maximum FM (${z_\mu }$). The difference of these two variables is expressed in Eq. (2) as ${z_n} - {z_\mu }$. When the difference of these two variables is zero, meaning ${z_n} = {z_\mu }$, the FM curve reaches its maximum with its corresponding minimum image scale. This location, where both planar positions are equivalent, is the location of the optimal focus. If the location of the maximum FM and its respective minimum scale depends solely on the positions of ${z_n}$ and ${z_\mu }$, we can conclude that σ in Eq. (2) is invariant to the position. Furthermore, this allows us to conclude that σ only serves to describe the dispersion of image scale relative to ${z_\mu }$.

If σ of the FM curve is assumed to be invariant to position, the σ of first order derivate of the Gaussian FM curve ($FM^{\prime}$) would also remain invariant to position, suggesting the approximate distance from ${z_{n - 2}}$ to ${z_\mu }$ can be estimated via the $FM^{\prime}$ at ${z_{n - 2}}$. Because the distance between ${z_{n - 2\; }}$ and ${z_\mu }$ can be approximated solely on the value of $FM^{\prime}$ at ${z_{n - 2\; }}$, gradient bins can be set at specific intervals to define a relationship between the local gradient and the necessary step size required to gather ideal FM data. The value of $FM^{\prime}\; $calculated via the difference between any two FM data at the beginning of AF serves as an excellent approximation to the Gaussian derivative and requires one fewer FM data than if the proper Gaussian derivative were used. The $FM^{\prime}$ calculated at the beginning of the AF can be categorized into a specified bin to return the necessary step size of the base scale σ. The minor influence from system noise by calculating $FM^{\prime}\; $in this manner has the possibility of categorizing an initial defocused position into the incorrect gradient bin, which is experimentally visible in section 4.5 as the mean number of FM data required for $N = 32\; $AF experiments is slightly greater than the theoretical five. This influence can be minimized by increasing the length in which $FM^{\prime}$ is calculated over. The assumption of similar σ values at various scales will be experimentally shown in section 4.2. The specific gradient bins and their corresponding σ step sizes are illustrated in Fig. 3 and defined in Table 2.

The intervals of the various gradient bins and their respective step sizes are selected such that three AF motions have a large probability of satisfying the condition of ${f_{n\; }} < {f_{n - 1}}$. Local $FM^{\prime}$ values that categorize the initial defocus position into Bin 1 have a low probability of containing the optimal focus and thus large step sizes of 2σ are made in the direction of ${z_\mu }$ determined by the sign of $FM^{\prime}$. Similarly, a local $FM^{\prime}$ value that categorizes the initial defocus position into Bin 2, is predetermined to be closer to the optimal focus and thus smaller step sizes of 1σ are used to minimize overshoot while still satisfying the same condition of ${f_{n\; }} < {f_{n - 1}}$. A change in the plane of ${z_\mu }$, i.e., there is a new optimal focal plane located at ${z_\mu }$, can be modeled similar to a simple step input in control theory. A reduction in the overshoot of this parametric function will reduce the time to completion and thus is a main priority in the AF control process. The interval of Bin 3 is defined with the intention of having Bin 3 contain completely unique $FM^{\prime}$ values. If Bin 3 were to encompass the entire range of -.5σ to .5σ (i.e., the grey and white regions in Fig. 3) the interval would not contain $FM^{\prime}\; $values that are completely unique to Bin 3. This would pose limitations to the binning process as a single $FM^{\prime}$ value could be assigned to multiple bins and thus the initial defocused position could not be properly identified. To compensate for this, the interval is chosen so that Bin 3 excludes the depth of field (DOF) region (i.e., the region considered to be focused given the specified lens defined in Fig. 3 with grey). Defining the Bin 3 interval in this manner offers excellent experimental performance, as AF is not required in the DOF region and the DOF region can easily be identified by its large FM values. Figure 4 illustrates the relationship between gradient bins and FM data with their respective adapted σ step sizes.

From Fig. 4 it can be seen that certain initial defocused positions may not be able to meet the necessary condition of ${f_{n\; }} < {f_{n - 1}}$ using only three FM data. To avoid this, step sizes defined by the specified gradient bin can be made *until* ${f_{n\; }} < {f_{n - 1}}$. For instance, given the Bin 2, σ steps region in Fig. 4, the condition of ${f_{n\; }} < {f_{n - 1}}$ can be achieved with a minimum of three FM data and a maximum of four. Similarly, if the initial defocus position resides in the Bin 1, 2σ steps region in Fig. 4, there is a minimum of three FM data required, and a maximum determined by the remaining defocus step positions in the working distance of the lens. Nevertheless, step sizes of 2σ from extreme levels of defocus provide an efficient path to the location of ${z_\mu }$ and still guarantee that at least two FM data will satisfy the necessary condition. All FM data (i.e., instances where three or more are collected to satisfy ${f_{n\; }} < {f_{n - 1}}$) are displayed as the array,

Moreover, defining the gradient bins to be accurate and effective requires knowing the approximate Gaussian model that will be repeatedly navigated for the AF task chosen. Because both σ and the gradient curves are assumed to remain constant. This model can be constructed through a simple global search conducted prior to AF. This global search takes FM data at every position in the working distance of the lens and calculates the approximate σ value according to Eq. (4). Subsequently, the gradient of the FM curve is calculated, and the specific gradient bin intervals with corresponding $FM^{\prime}$ values are designated at the σ locations mentioned previously. After this model has been constructed, fast and accurate AF control can be achieved. The workflow of the proposed Gaussian standard deviation and gradient-based binning method is detailed in Fig. 5.

## 4. Experimental results

#### 4.1 Experimental set-up

Proper evaluation of the proposed method first required designing a system capable of housing hardware necessary for variable focus image acquisition. The selected hardware includes: a Computar M2514-MP2 25 mm lens with working distance of 100-900 mm, Basler acA2040-120 µm 3.2 MP resolution mono USB 3.0 camera, Epson EM-293 step motor (AF step motor), and a 2 mm pitch gear and belt system responsible for changing the lens object distance. The Computar C-mount lens has a fixed focal length of 25 mm and an aperture range from F1.4 - F16. Figure 6(b) shows a rendered drawing of the 3D printed housing fitted with the selected image acquisition hardware. The AF control of the image acquisition system (IAS) begins with a greyscale image taken by the camera. This image is sent to the computer via USB 3.0 where the GDER FMF calculates the current FM and outputs the next motor movement to the Arduino Uno. Using an Adafruit Motorshield V2 motor driver, the AF step motor will turn the 2 mm pitch gear and belt system to control the lens before another image is taken and the process is repeated. A single completion of this IAS feedback loop is responsible for one image, meaning the mean FM data acquired simply refers to how many of these IAS loops are needed to return a theoretically focused ROI. This control feedback loop of the IAS can be followed in Fig. 6(c). Each AF method was programmed in a MATLAB R2019b environment with Intel Core i9-9900X CPU @ 3.60 GHz, NVIDIA GeForce RTX 2080, and 64 GB of RAM.

AF time to completion is highly dependent on the distance between the initial defocused position and ${z_\mu }$ as motor movements are by far the greatest contribution to time. For this reason, devising an unbiased method for AF evaluation requires repeatedly introducing random levels of defocus to the IAS before AF. This was achieved by fixing the IAS via aluminum extrusion to a 1979 Nikon Measurescope modified and fit with a Nema 17 step motor ($z$ step motor in Fig. 6(a)) for autonomous *z* direction control. Changing the *z* position of the IAS to random positions between the minimum and maximum of the working distance of the lens ensures that all AF methods will be evaluated under similar initial defocused conditions. Prior to evaluation, a variety of targets were chosen to assess the robustness of each AF method to varying levels of ROI contrast. Under ideal lighting conditions, all three AF methods were evaluated for $N = 32$ experiments using Images 1-3 from Fig. 7(a), with a luminance of approximately 3500lx measured with a Hioki Lux Meter. Additionally, the proposed GB algorithm was evaluated for $N = 32$ experiments using Images 1-2, however this time, under low-light conditions measured at 450lx (Fig. 7(b)). Images from both Fig. 7(a) and 7(b) are stills of size 350 × 350 pixels captured directly from a video at approximately 150fps with 1900µs exposure time. Section 4.5 discusses the results of the evaluation and are displayed in Table 4.

#### 4.2 Focus measure curve standard deviation invariance

As mentioned in section 3.5, the concept of an adaptative step size based on the dispersion of the GDER FM curve, σ, operates under the assumption that σ is invariant to position. The proposed method assumes unchanged ROI content during repeated AF (i.e., the relative amount of detail presented to the GDER FMF remains similar). As previously mentioned, if the ROI content remains the same, implying the same target is being imaged repeatedly at different planar positions (e.g., metrology, pattern analysis, microscopy), the physics of the lens itself is responsible for any variation in σ. This effect is due to slight changes in the DOF at different object distances. These slight changes ultimately influence the maximum clarity of the image presented to the GDER FMF and in turn, the σ. Because the variation in σ is minimal, this assumption is shown to be satisfactory throughout our experiments. It is important to note the limitations from needing to construct an assumed Gaussian model prior to AF. These limitations will be is addressed in section 5 regarding our future work. Figure 8 visually shows the similarity between the GDER FM and $FM^{\prime}$ curves for Image1 (shown in Fig. 7(a)). Table 3 displays the σ of Image1 with respect to different ${z_\mu }$ locations calculated using Eq. (4) and (3) respectively.

#### 4.3 AF methods for comparative evaluation

In order to properly assess the speed and accuracy of the proposed algorithm, the proposed *Gaussian Binning* (GB) method will be directly compared to a fast and accurate *Adaptive Hill-Climbing* method (AHC). Traditional Hill-Climb methods are among the most popular rule-based methods used for AF control. These Hill-Climbing algorithms iteratively take photos along the FM curve at a specified interval and continually move in the direction of increasing FM values, stopping when it has arrived at a peak [28]. This simple rule-based approach performs optimally in many situations, however, can converge to “false peaks” [20]. Furthermore, traditional Hill-climb algorithms are an iterative process meaning many FM data are required creating long image processing times. The AHC method significantly improves upon the traditional rule-based Hill-Climbing methods through the introduction of an adaptive step size [28]. This adaptive step size both reduces AF time by using fewer FM data and reduces the frequency of “false peak” convergences by minimizing the influence of system noise. The AHC’s improvements to traditional Hill-Climbing regarding robustness and speed allows it to serve as an excellent industry standard for evaluating the proposed GB method.

Accurately assessing the performance of the proposed AF method also requires a base-line comparison. As previously mentioned, Global Search methods (GS) are one of the most basic rule-based approaches to AF. GS methods require many FM data, are computationally expensive, and are extremely slow. GS methods do however demonstrate a general lower limit in terms of AF speed and accuracy and thus will serve as a base-line comparison to the AHC and GB AF methods.

#### 4.4 Metrics for evaluation

The three AF methods will be comparatively evaluated using the following metrics: RMSE, mean time to completion, and mean FM data acquired.

The RMSE of the AF process is defined as

where N is the number of repeated experiments, ${z_{\mu i}}$ is the ground truth location of the optimal focus and $\widehat {{z_\mu }}$ is the final lens plane location, presented by the step motor position, for experiment*i*. Returning a focused image during the AF process implies that the final estimated optimal focus position of $\widehat {{z_\mu }}$ resides within the DOF of the lens. The upper and lower bounds of the DOF represented by ${z_{\mu + 6}}$ and ${z_{\mu - 6}}$ are displayed in Fig. 9. The Computar M2514-MP2 machine vision lens used for evaluation has a DOF of 25 mm which equates to approximately 12 step motor steps using the gear and belt system shown in Fig. 6(b). Being able to consistently bring the image located at within the DOF range specified by the lens, implies a satisfactory AF experiment has been completed. Finding the location of ${z_\mu }$ after each successful AF experiment to calculate RMSE requires a small local search centered around the position of $\widehat {{z_\mu }}$. For all experiments

*N*, 15 steps bilateral to $\widehat {{z_\mu }}$ was sufficient to obtain ${z_\mu }$ in the local search of the FM curve. From there, a finer ${z_\mu }$ was calculated via a Levenberg Marquardt regression using only the FM data located in the DOF (the blue line in Fig. 9). This small regression removes slight variations due to system noise that might otherwise shift the location of the legitimate ${z_\mu }$. Figure 9 illustrates the post-AF local search FM data, DOF bounds and regression, as well as the estimated, $\widehat {{z_\mu }},$ and ground truth, $\; {z_\mu },$ optimal focus positions.

The mean time to completion simply refers to the average time from initial defocused *z* position to completion of the final motor movement at $\widehat {{z_\mu }}$ given *N* AF experiments (e.g., after final motor movement in Fig. 4).

The mean FM data acquired refers to the average number of images processed given *N* AF experiments. This number equates to the average number of completions of the image acquisition system feedback loop (shown in Fig. 6(c)) required to arrive at $\widehat {{z_\mu }}$ . This metric quantifies both the computational cost as well as power consumption and thus should be minimized.

#### 4.5 Results and discussion

Table 4 compares the proposed novel GB method against the AHC and base-line GS methods for Images1-3 in ideal lighting conditions. Table 4 also includes an average of performances across all ideal lighting experiments.

The proposed method outperforms the GS and AHC methods in both mean FM data acquired and mean time to completion, while still offering satisfactory RMSE for all targets. The inclusion of an adaptive step size based on the σ of the FM curve estimated via gradient derivative binning, reduced the necessary images for AF by approximately 80% as well as focus time by 22% when compared to the leading AHC method. Furthermore, the proposed method offered similar performance in low-light conditions while still maintaining sufficient accuracy as measure by RMSE. The GB method is also proven to be robust to low-light conditions as shown by the results from low-light testing displayed in Table 5. Our AF method requires far fewer images and shorter focus times than the AF algorithm proposed by Zhang et al. [1].

From the results displayed in Table 4, we can see that the total AF time is reduced by 22% after using GB to replace the AHC method, but the mean number of FM data acquired decreases by 80%. The experiment’s hardware, including the lens driver structure of stepper motor and belt, creates the bottleneck of this AF system regarding focus time. From Fig. 10 we can see that 77.4% of the overall GB method’s AF time is attributed to the AF step motor. For an AF time of 500 ms, the motor movement time will contribute to 387 ms. Therefore, upgraded hardware is necessary to evaluate an AF algorithm that is as fast as the proposed GB method. The stepper motor can be upgraded to a faster driver, e.g., piezo driver. The piezo driver only takes 3.3 ms for a full range movement [29]. A driver this fast will reduce the motor motion time to less than 1% of the experimental AF step motor in this paper.

## 5. Conclusions

In this paper, we first established a relationship between the FM curve standard deviation and the working distance of the AF system. We theorized that the σ describing the dispersion of the FM curve would be invariant to position. Our hypothesis was confirmed in Table 3, showing that σ remains similar with respect to ${z_\mu }$. The slight variation in σ is due to the physics of the lens system at different object distances. We then devised a novel method using $FM^{\prime}$ binning to get a suitable step size according to the base scale σ of the FM curve. By using this adaptive step size, we minimize the number of images needed for AF. At the same time, an innovative direct calculation for the location of the optimal focus was put forward as the core idea of our algorithm. By using this innovative method, we reduce the image processing time dramatically. By comparing the GB method with the base-line GS method and the AHC method, it is proven that the proposed method provides significant advantages regarding the necessary number of processed images required to AF. Furthermore, the proposed method has been shown to be robust to both well-lit and low-light conditions.

In the future, we theorize that with an upgrade of driving technology, the GB method will provide an even greater benefit regarding AF time due to the fewer number of images required for AF. Faster lens motion can present even greater advantages regarding AF time when comparing the proposed GB method to the AHC method. Moreover, there exists a limitation regarding the operating assumption of the proposed GB method. Because the method assumes unchanged ROI content during repeated AF, the scope of the technology is limited to stationary imaging scenarios where the initial Gaussian model and binning limits can be repeatedly recalled. Although the GB method has shown promise for the previously mentioned scenarios, methods for constructing the Gaussian model based on a single defocused image are necessary if the proposed method is to reach dynamic technology. In our future work, we will theorize this model, its standard deviation, and respective binning limits can be inferred from a single defocused image in accordance with scale-space theory. Using the high and low frequency information present in the ROI content, we wish to introduce artificial intelligence to construct our initial Gaussian model.

## Funding

Directorate for Engineering (CMMI-1916866, CMMI-1942185).

## Disclosures

The authors declare no conflicts of interest.

## Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

## References

**1. **X. Zhang, Z. Liu, M. Jiang, and M. Chang, “Fast and accurate auto-focusing algorithm based on the combination of depth from focus and improved depth from defocus,” Opt. Express **22**(25), 31237 (2014). [CrossRef]

**2. **L. Wei and E. Roberts, “Neural network control of focal position during time-lapse microscopy of cells,” Sci Rep **8**(1), 7313 (2018). [CrossRef]

**3. **A. J. LeSage and S. J. Kron, “Design and implementation of algorithms for focus automation in digital imaging time-lapse microscopy,” Cytometry **49**(4), 159–169 (2002). [CrossRef]

**4. **J. H. Price and D. A. Gough, “Comparison of phase-contrast and fluorescence digital autofocus for scanning microscopy,” Cytometry **16**(4), 283–297 (1994). [CrossRef]

**5. **R. Ng, M. Levoy, M. Brédif, G. Duval, M. Horowitz, and P. Hanrahan, *Light Field Photography with a Hand-Held Plenoptic Camera* (Stanford university, 2005), p. Stanford University Computer Science Tech Report.

**6. **A. Shajkofci and M. Liebling, “DeepFocus: a Few-Shot Microscope Slide Auto-Focus using a Sample Invariant CNN-based Sharpness Function,” arXiv:2001.00667 [cs, eess] (2020).

**7. **X. Yu, R. Yu, J. Yang, and X. Duan, “A Robotic Auto-Focus System based on Deep Reinforcement Learning,” arXiv:1809.03314 [cs] (2018).

**8. **C.-C. Chan and H. H. Chen, “Autofocus by deep reinforcement learning,” Electronic Imaging **2019**(4), 577 (2019). [CrossRef]

**9. **C. Herrmann, R. Strong Bowen, N. Wadhwa, R. Garg, Q. He, J. T. Barron, and R. Zabih, “Learning to Autofocus,” in 2020IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (IEEE, 2020), pp. 2227–2236.

**10. **J. J. Koenderink, “The structure of images,” Biol. Cybern. **50**(5), 363–370 (1984). [CrossRef]

**11. **Z. Zhou, C. Li, T. He, C. Lan, P. Sun, Y. Zheng, Y. Yin, and Y. Liu, “Facile large-area autofocusing Raman mapping system for 2D material characterization,” Opt. Express **26**(7), 9071 (2018). [CrossRef]

**12. **W. Yang, F. Knorr, J. Popp, and I. W. Schie, “Development and evaluation of a hand-held fiber-optic Raman probe with an integrated autofocus unit,” Opt. Express **28**(21), 30760 (2020). [CrossRef]

**13. **C.-C. Gu, K.-J. Wu, J. Hu, C. Hao, and X.-P. Guan, “Region sampling for robust and rapid autofocus in microscope,” https://onlinelibrary.wiley.com/doi/abs/10.1002/jemt.22484.

**14. **G. Chataignier, B. Vandame, and J. Vaillant, “Joint electromagnetic and ray-tracing simulations for quad-pixel sensor and computational imaging,” Opt. Express **27**(21), 30486 (2019). [CrossRef]

**15. **. “Autofocus System and Evaluation Methodologies: A Literature Review,” Sens. Mater.1165 (2018).

**16. **M. I. Shah, S. Mishra, M. Sarkar, and C. Rout, “Identification of robust focus measure functions for the automated capturing of focused images from Ziehl-Neelsen stained sputum smear microscopy slide,” Cytometry A **91**(8), 800–809 (2017). [CrossRef]

**17. **A. Papoulis, “The Fourier Integral and its Applications,” Polytechnic Institute of Brooklyn, McCraw-Hill Book Company Inc., U.S.A, ISBN: 67-048447-3 (1962).

**18. **J.-M. Geusebroek, F. Cornelissen, A. W. M. Smeulders, and H. Geerts, “Robust autofocusing in microscopy,” Cytometry **39**(1), 1–9 (2000). [CrossRef]

**19. **Y. Xiong and S. A. Shafer, “Depth from Focusing and Defocusing,” inIn Proc. of the DARPA Image Understanding Workshop (1993), pp. 68–73.

**20. **N. Kehtarnavaz and H.-J. Oh, “Development and real-time implementation of a rule-based auto-focus algorithm,” Real-Time Imaging **9**(3), 197–203 (2003). [CrossRef]

**21. **Y. Xiong and S. A. Shafer, “Depth from focusing and defocusing,” in Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (IEEE Comput. Soc. Press, 1993), pp. 68–73.

**22. **C.-M. Chen and H.-M. Lee, “An Efficient Gradient Forecasting Search Method Utilizing the Discrete Difference Equation Prediction Model,” Applied Intelligence **16**(1), 43–58 (2002). [CrossRef]

**23. **Z. Ren, E. Y. Lam, and J. Zhao, “Acceleration of autofocusing with improved edge extraction using structure tensor and Schatten norm,” Opt. Express **28**(10), 14712 (2020). [CrossRef]

**24. **C. Wang, Q. Huang, M. Cheng, Z. Ma, and D. J. Brady, “Intelligent Autofocus,” arXiv:2002.12389 [eess] (2020).

**25. **A. Sandler, M. Howard, A. Zhu, L.-C. Zhmoginov, and Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” in 2018IEEE/CVF Conference on Computer Vision and Pattern Recognition (IEEE, 2018), pp. 4510–4520.

**26. **S. Pertuz, D. Puig, and M. A. Garcia, “Analysis of focus measure operators for shape-from-focus,” Pattern Recognition **46**(5), 1415–1432 (2013). [CrossRef]

**27. **H. Mir, P. Xu, and P. van Beek, “An extensive empirical evaluation of focus measures for digital photography,” in * Digital Photography X* (International Society for Optics and Photonics, 2014), 9023, p. 90230I.

**28. **Weidong Xiao and W. G. Dunford, “A modified adaptive hill climbing MPPT method for photovoltaic power systems,” in2004 IEEE 35th Annual Power Electronics Specialists Conference (IEEE Cat. No.04CH37551) (2004), 3, pp. 1957–1963 Vol.3.

**29. **“piezosystem jena Inc.,” Microscopy Today25(S1), 11 (2017).

**30. **Y.-C. Liu, F.-Y. Hsu, H.-C. Chen, Y.-N. Sun, and Y.-Y. Wang, “A coarse-to-fine auto-focusing algorithm for microscopic image,” inProceedings 2011 International Conference on System Science and Engineering (IEEE, 2011), pp. 416–419.