Robust contrast enhancement method using a retinex model with adaptive brightness for detection applications

Rizwan Khan; Atif Mehmood; Zhonglong Zheng; Zhonglong Zheng

doi:10.1364/OE.472557

1. Introduction

High-quality images are the building blocks of modern computer vision applications. Capability to capture these images swiftly with a modest computational complexity resulted in the widespread use of digital devices. However, the benefits are confined to the applications working in ideal lighting and normal environmental conditions. Achieving similar results in dark scenarios, where the inadequate lighting and degraded visual conditions defy the camera setting, is a great challenge [1,2]. The applications with the onboard camera’s workings from dawn to dusk in these bad conditions face serious problems in the detection [3] and recognition [4,5]. The purpose of improvement in the color and contrast of the image is to make them more suitable for human and machine vision/detection applications [6].

The lighting and retinex theory of color vision [7] models the color perception of human vision on scenes. However, challenges arise in real-world degraded visual environments, where the captured images suffer from low contrast and heavy noises [8]. Most traditional methods act globally while promoting brightness without considering the neighboring pixel’s energy. Moreover, the direct enhancement process amplifies the noises, objectionable structure, and texture artifacts. The problem has been approached by using a variety of techniques including imaging sensor-oriented methods [9,10], histogram-based hue and range (gamut) preservation [11,12] and retinex based methods [13,14]. But the above challenges are complex and can hardly be solved simultaneously. In the case of a degraded visual environment, the entailed issues are more extensive than before. Images captured in these scenarios frequently have skewed contrast and under-enhancement with objectionable structure and texture artifacts. Existing network based contrast enhancement methods are mainly dependent on image pairs [15,16,17] priors [18,19] and large scale dataset [20]. These methods are classified as supervised (i.e., work with pairs) and unsupervised methods (i.e., no pairs are required). Some of these methods proposed retinex-based decomposition networks [13,15], but heavy noises in dark scenarios and artificial texture amplification with color and contrast distortion deteriorate the global structure of the final image with ill-posed decomposition. A practical image enhancer with denoising operation is proposed in [21] and an enlighten-GAN is proposed in [22], but the former is dependent on image pairs and later relies on careful image selection for enhancement.

In real-world situations, inconsistent lighting in combination with bad weather results in issues with poor vision, low contrast, and color fading [23,24] problems. It ultimately limits the performance of detection frameworks, such as YoloV3 [3], which are supposed to work on high-quality images. Although computational imaging techniques [25][26], expensive sensors in digital cameras [27,28,29] and some filtering techniques [30,31] can somehow help to alleviate these problems. But the images captured in lowlighting conditions still have poor visibility and contrast distortions. At the early attempts, histogram equalization-based methods [32], and gamma correction based [33] methods were proposed to enhance the contrast, but these methods promote brightness without considering the energy of the neighboring pixels with truncation problem [34]. Extreme dark backgrounds and degraded environmental conditions with uneven lighting induce the problems of color highlights and low contrast. It can be fixed by extracting contrast for the darkest areas and addressing the illumination component separately for the color fading problems [35]. But the separation of illumination from the reflection component in the presence of noise in the target scenario is an inverse and highly ill-posed problem. Decomposition-based approaches involve the retinex principle [7], where single-scale retinex and multiscale retinex-based methods [36] are one of the earliest approaches for the color and contrast enhancement [37]. It demands additional regularization constraints to make it a well-posed problem [38]. In other words, the enhancement without a reasonable consistency of decomposition amplifies the noises and induces structure and texture artifacts with fake boundaries, and edges [39]. It limit the performance of tracking, recognition and object detection algorithms [40], [41,42].

The existing methods for lowlight image enhancement (LIME) [43] are mainly based on priors [44] robust retinex model to reveal image structure [45]. A learning-based restoration of back-lit images [46] is proposed with an optimal contrast tone mapping for backlit images. Similarly, the multiview image contrast enhancement method is proposed in [47] for dark scenarios while considering more than one camera. Illumination-based feature fusion FEW [48], global illumination awareness network (Glad-Net), and deep curve estimation Zero-DCE are readily available methods for lowlight image enhancement. A multi-branch lowlight enhancement net (MBLLEN) is proposed in [49]. The most recent combination of network-based and retinex-based models [16] has achieved great success. However, most of these models work in a supervised manner by utilizing the image pairs to enhance the image quality. An unsupervised Enlighten GAN [22] is proposed to solve the challenges of paired training data, but it also demands a careful selection of the input images. Most of the extant work follows a predominant trend of paired training data, and some other gathered large-scale datasets [20]. It raises serious concern about the practical and real-world implications of these methods, where poor visibility and color fading are typical problems [50]. The amplification of the heavy noises in the darkest background regions and inaccurate illumination estimate divert the noise and blackspots on irregular reflection [15]. A joint enhancement and denoising (JED) [51] and a variational model is proposed in [8] for simultaneous denoising and contrast enhancement. However, denoising after enhancement removes fine details and boundaries with noise, whereas denoising before enhancement produces blurry output. Both types of output affect the detection and recognition applications.

In the last two decades, the computer vision community has witnessed excellent strides in improving the quality of images in various environmental conditions [13, 44, 52,–56,]. However, the existing enhancement approaches have higher computational complexity and are restricted in their capacity to generalize the visibility [1]. Some of the existing methods requires a carefully designed large scale datasets [57] and some other relies on paired training data [17,44]. Capturing image pairs and selecting large-scale data obstruct the practical implication. Meanwhile, learning-based restoration for backlit images (LBR) [58], and multiview high dynamic range imaging [47, 59] methods have been proposed by splitting the input images into front-lit and backlit regions, a computationally complex process. Classic retinex model-based methods, such as robust retinex model (RRM) [45], deep retinex [15] split the image into reflection and illumination, however produce inconsistent reflection. Follow-up of these methods, such as kinD [16] and dark to bright view (D2BV-Net), relies on image pairs. After that, almost all of the methods follow the predominant trend and rely on image pairs [16], priors [43], and careful selection of data [22]. Consequently, the capacity of these models heavily relies on the type of training data, synthetic images and gamma corrections, etc. Collecting such data is an impractical and tedious task that raises the dependency on input training data besides computational complexity. This is especially true in extremely low light restoration, where colors are difficult to recover and noise suppression is challenging. As a result, it’s customary to design a large-scale/paired training dataset which raises the computational complexity and raises issues regarding real-world deployment.

In view of the above problems, the contributions of this work can be summarized below.

• We separate the high-frequency reflection component from the illumination component to divide the overall solution space into two subspaces and minimize computational complexity. To constrain the ill-posed decomposition, we employ a comprehensive weighting scheme to improve visualization in combination with adaptive adjustments through hyperparameters in the weight functions in the proposed scheme.
• We propose to seek guidance for the illumination component via a high-frequency reflection component to preserve the structure and texture details in degraded visual conditions. In this regard, we also optimize the total variation loss to act as strong preservation prior.
• The proposed scheme is embedded into our deep into darkness network, consisting of a multilayered deep-net to separate reflection and illumination components and a bright-net (i.e., encoder-decoder) for piece-wise smoothness of illumination. Independent refinement operations obstruct the amplification of the artifacts at subsequent enhancement stages, which significantly improve detection performance in darkness. Our framework is capable of working without any dependency on the input data (i.e., low/normal light or low-normal image pairs).
Experimental results demonstrate that the proposed method achieves superior performance as compared to the state-of-the-art approaches.

2. Methodology

The framework of the proposed method is shown in Fig. 1. As can be seen, the input image $I_x$ is split into reflection $R_x$ and illumination $T_x$ components with the help of a multilayer deep net.

(1)$$I_x = R_x \cdot T_x$$

Unlike direct amplification, which distorts the boundaries and visual contrast, we rely on the ill-posed image decomposition to independently handle the structure and texture distortions. It allows constraining irregularities of high-frequency reflection components while extracting maximum information with independent adjustments. Whereas the illumination adjustments in the bright-net (i.e., encoder-decoder-styled unit) maintain the balance of visual smoothness. It is achieved through adaptive illumination guidance obtained through the high-frequency component. The subsequent loss adjustments based on regularization of high-frequency reflection and illumination components will limit the objectionable artifacts significantly to improve the brightness, visual smoothness and contrast to improve the performance of detection and tracking systems in extremely dark and degraded visual conditions.

Fig. 1. Framework of the proposed method. A Multilayer Deep-Net separates reflection and illumination, which are refined with proposed weighting parameters. A Bright-Net performs illumination refinement operations. The final image is obtained as a product of refined reflection, and illumination is utilized in the YoloV3 framework for detection.

Download Full Size | PDF

2.1 Proposed deep into darkness strategy

The recent supervised learning [20,15,16,44,57] and unsupervised learning based methods [22,60,55] have achieved a great success, but cannot produce satisfactory results in degraded visual conditions. The model’s success in such circumstances is associated with enhancement and computational complexity, latency, and generalizability. The supervised learning models utilized the image pairs with ground truth, whereas unsupervised learning-based methods work without paired supervision, but require a large scale and a carefully designed dataset. However, the practical implication poses several challenges to both categories.

Our method addresses these challenges through a hybrid learning framework, which works without relying on the data type and learns through decomposition consistency. Maximum information through the high-frequency reflection component can provide adaptive guidance for the illumination component for visual smoothness. A comprehensive weighting estimation scheme for an image ${I_x}$ with reflection $R_x$, illumination, ${T_x}$, can remove the darkness in the degraded visual conditions as shown in Fig. 2 and Fig 3.

It is an intuitive observation that most of the pixels in an 8-bit haze image have maximum intensity (i.e., maximum N = 255), which is why it appears to be white. Thus, the darkening benefits the haze image, and haze removal can be thought of as an inverse of the low light image. Consequently, the residual image is estimated as $N$ - $I_x$ for low light and haze images, with $N$ acting as local adjustment parameters. It shows that images in degraded visual conditions share the $T_x$ for all 8-bit images. Following the aforementioned process, we split the image space into two sub-spaces. As it is a highly ill-posed problem and demands additional regularization constraints.Thus to simplify each of the solution spaces, we directly propose two weight functions ${\Im }$ (i.e., illumination propagation function) and ${\Re }$ (i.e., reflection intensification function) considering the mutual consistency of image decomposition into $T_x$ and $R_x$ in Eq. (1).

(2)$$\Re= \frac{{exp^{\left({\zeta -\frac {R_x}{N}}\right)}}}{n}$$

(3)$$\Im= \frac{{exp^{\left(\zeta-{\frac {N-T_x}{N}}\right)}}}{n}$$

where, $1\geq \zeta \leq 255$ and $0<n\leq 1$ act as global adjustment parameter to directly promote brightness and contrast in combination with the $\Re$, and $\Im$. In this work, the value of $\zeta$ varies from 1 to 128 and the value of $N$ varies from 1 to 255, based on extensive experiments for the best visual and structure preservation following the network loss adjustments. These parameters control the steepness of the image gradient. A significant trade-off for the local and global parameters mitigates the diversion of black spots on final reflection. The point-wise product of these parameters with $R_x$ and $T_x$ respectively adjust the reflection irregularities and illumination inconsistencies to promote smoothness for ill-posed image decomposition. It can be seen in Fig. 2, (a) is the input, (b) several artifacts (i.e., structure and texture distortions, noises), (c) is illumination, (d) illumination adjustments, (e) reflection, and (f) shows reflection adjustments. We proposed to estimate the reflection component in a new manner with an adaptive hyper-parameter ${0 < n \leq {1}}$ to divert the irregularities of reflection as shown in Eq. (2). The lower value of this parameter promotes energy and brightness to the pixels and vice versa. Once the training process is complete, it acts as a direct and global parameter and contributes to directly extracting the contrast for the darkest background regions, where the final image is obtained as a product of the reflection and illumination component.

(4)$$\hat{I_x} = \hat{R_x}\cdot T_x$$

where the value of $\hat {R}$ = $R_x \cdot \Re$ . It is established through extensive experiments that the variation in the value of hyper-parameter $n$ in the weight function $\Re$ can change the balance of brightness in the final image. It contributes to extracting the darkest background regions without passing through the entire training process. It has many interesting applications, including image-based relighting, detection, and tracking in extreme dark background scenarios. However, it is out of the scope of this paper to explain all of these applications in detail; thus, we will extend the proposed scenario for object detection.

Fig. 2. Operational adaptability of the proposed approach, (a) is the input, (b) several artifacts (i.e., structure and texture distortions, noises), (c) is illumination, (d) illumination adjustments, (e) reflection, and (f) shows reflection adjustments.

Download Full Size | PDF

2.2 Proposed network architecture

The proposed deep into darkness net comprised of two sub-networks shown in Fig. 4 and 5. The network is end-to-end trained, where the input image is decomposed into reflection and illumination components in a subnet termed a deep-net. Another encoder-decoder sub-net, denoted as bright-net, acts to induce structure and texture refinement with adaptive illumination adjustments. The above-mentioned weight functions are embedded in our network for subsequent enhancement operations, as shown in Fig. 1. In deep-net, a stack of convolutional layers ($3 \times 3$) initializes the features’ extraction for the given input data. Next, the rectified linear unit (ReLU) with a stack of conv-layers ($3 \times 3$) is applied, and the features are concatenated. After that, several conv-layers with ReLU are applied. Finally, Tanh activation is applied to bound the reflection and illumination features in the specified range.

As mentioned earlier that it is desirable to constrain the illumination inconsistencies for the piece-wise smoothness to induce structure and texture awareness. But the illumination should be consistent with respect to the structure, texture, and boundaries. It demands exponential penalization of the image gradient component to obtain strict control over the steepness of the slope. To make these contemporary adjustments and guide the illumination component to retain the structure and boundaries, we employ an encoder-decoder-unit with up-sampling $3\times 3$ and down-sampling $3\times 3$ to maintain the illumination consistency. Upsampling and down-sampling suppress the noises, where we introduced skip connections to maintain the brightness. Multiscale consistency of illumination is achieved through the concatenation operation on the illumination. It is important to note that inconsistent illumination diverts the noises and black spots on the reflection component, where we guide the illumination with the proposed weight functions to preserve the structure and texture details. It improves the illumination consistency while maintaining the balance of color, contrast, and brightness in the final image.

It is also desirable to maintain strong boundaries and edges. But strong preservation priors are required in the case of the target problem. In this regard, we employed a total variation (TV) operator to act as preservation prior while obstructing the amplification of noises. However, it becomes structurally blind in the target problem. We optimize TV operation with proposed weight functions to obstruct the structural blindness of the TV. We consider both horizontal and vertical components of the gradient to obtain strict control over the steepness of the image gradient. The effectiveness of the proposed operations is shown in Fig. 3, where (a) shows the input, (b) shows the amplification of the structure and texture artifacts, and (c) depicts the effect of the proposed operations to suppress the amplification of the objectionable artifacts. It can be seen that the proposed operations improve the brightness and contrast while preserving the strong boundaries and edges.

Fig. 3. Preservation of the color and contrast with structure and texture details. (a) input images from exclusive dark dataset [63], (b) direct enhancement, and (c) with the proposed method.

Download Full Size | PDF

Fig. 4. Proposed multi-layered deep-net to decomposed image into $R_x$ and $T_x$.

Download Full Size | PDF

Fig. 5. Proposed encoder-decoder style bright-net for illumination refinement.

Download Full Size | PDF

We conducted a series of experiments to demonstrate that the features extracted through a stack of convolutional layers and constrained through an activation function (i.e., $Tanh$ in this case.) can produce significant results for the image enhancement under the guidance of the proposed strategy. We separate the high-frequency reflection component from the illumination component to initialize the subsequent enhancement operations to handle each component independently. The independent refinement operations obstruct the artifacts’ amplification in the subsequent stages. The illumination adjustments are made under the guidance of the residual image followed by a high-frequency component to generalize the network’s operation, as shown in the loss adjustment operations in the next section.

2.3 Loss adjustments

The loss function of the proposed network is designed to constrain the artifacts associated with each of the components (i.e., $R_x$ and $T_x$). We perform experiments with both $l_1$ and $l_2$ norm. During experiments, it was noted that we could eliminate the noises using the squared ${l_2}$ norm, but this regrettably penalizes the gradients and hardly preserves the edges. Instead of using the squared $l_2$ norm, we proposed to utilize the ${l_1}$ norm of the gradient to preserve the boundaries and edges. The reflection and illumination components were further weighted with the proposed scheme to mitigate the irregularities of total variation (TV) [61].

We confine the enhancement losses in three distance terms including, image formation loss, ${\mathscr {L}_{form}}$, reflection loss, ${\mathscr {L}_{R}}$, and illumination adjustment loss ${\mathscr {L}_{R}}$, calculated as ${l_1}$ norm. The loss functions used in the final image reconstruction are summed up together for the proposed architecture.

(5)$$\mathscr{L} = \lambda _{form} \mathscr{L}_{form}+ \lambda _{R} \mathscr{L}_{R}+ \lambda _{T} \mathscr{L}_{T}$$

The tradeoff parameters $\lambda _{form}$ is 0.1, $\lambda _{R}$, and $\lambda _{T}$ is 0.001. The ${\mathscr {L}_{form}}$ is designed to constrain the reconstruction error and estimated as below:

(6)$$\mathscr{L}_{form} = \left \| I_x-R_xT_x \right \|$$

where $\zeta = 1$, the reflection intensification parameter, $\Re$, and illumination propagation parameters $\Im$ update the $R_x$ and $T_X$ respectively as a product, which improves the network latency and promotes smoothness respectively, for ill-posed image decomposition. It can be seen in Fig. 2 and Fig. 3.

(7)$$\mathscr{L}_{T} = \left \| \bigtriangledown T_x \cdot \Im \cdot \exp(\psi R_x \cdot \Re) \right \|$$

The product of illumination with propagation parameters $\Im$ shown in Eq. (3) induce updated illumination to seek the guidance through reflection, which itself get updated as product with $\Re$, parameter expressed in Eq. (2). In addition a sustainable trade-off parameter (i.e., ${\psi }$ = -10) is introduced in Eq. (7) for a direct control in observable guidance.

Network optimization settings: The network is trained from scratch using Adam-Optimizer and back-propagation in the Tensorflow framework with a learning rate of $1e^{-3}$, where a batch size is 16 and a patch size of 128x128 for 50 epochs.

3. Datasets, experiments, and discussions

3.1 Datasets

For many years, researchers have been working on back-lit [58], ill-lit [14], and low-light [20,15] image enhancement problems. We utilize lowlight datasets, i.e., UXOV [14], LOL [15] and Adobe FiveK dataset [62] sample test images. The lowlight UXOV dataset consists of 1300 images with 18 distinct lighting conditions, and the LOL dataset consists of 500 lowlight image pairs. Adobe-FiveK dataset utilized in this work consists of 5000 images with several indoor and outdoor scenarios and lighting conditions captured with the help of a DSLR camera. The images in this dataset were manually retouched and adjusted with 5 experts who were labeled as A, B, C, D, and E. We utilized the last 500 images as test images and selected Expert C’s images as ground-truth images during testing. In addition, to evaluate object detection (i.e., people detection in this case) performance, we utilize 300 images from the exclusive dark dataset [63].

3.2 Evaluation metrics

The performance for the enhancement is tested in terms of peak signal-to-noise ratio (PSNR), structure similarity index measure (SSIM) [64], Light order enhancement (LOE) [64], naturalness image quality evaluator (NIQE) [65], and a learned perceptual image patch similarity (LPIPS) [66]. The performance in the case of object detection is measured in terms of Average Precision (AP) and Average Recall metrics.

3.3 Subjective and objective comparison

We provide a subjective and objective comparison to evaluate the performance of the proposed method. We compare our method with several low light state of the art image enhancement approaches, including KinD [16], deep retinex [15], multi-branch low-light enhancement net (MBLLEN) [49], Glad-Net [67], HDR-Net [17], Enlighten-GAN [22], Zero-DCE [68], dark to bright view (D2BV) [14], and joint enhancement and denoising (JED) [51], a retinex-inspired unrolling with architecture search (RUAS) [44], LIME [43] and a beyond brightening low light image enhancement (BBLLI) [21] methods.

The objective comparison of the LOL and UXOV datasets and Adobe FiveK datasets is presented in Table 1, Table 2 and Table 3 respectively. Our method achieves the best numeric scores without any reliance on the image pairs and large-scale or carefully selected images, which is a prerequisite by almost all of the extant methods in the target domain. Our method requires only a few shots of input images without careful data selection.

Table 1. Objective evaluation, PSNR and SSIM comparison with various methods on UXOV-dataset.

View Table | View all tables in this article

Table 2. Objective quality comparison on the LOL dataset images and dependencies of various state-of-the-art approaches.

View Table | View all tables in this article

Table 3. Comparison of the Adobe FiveK dataset sample test images in terms of PSNR, SSIM, LOE, and NIQE objective metrics.

View Table | View all tables in this article

In order to illustrate the effectiveness of the visual quality of the proposed framework, we present the visual comparison in Fig. 6, Fig. 7, Fig. 8, Fig. 9 and Fig. 10. In Fig. 6 the model images captured in back-lighting conditions are compared with LBR [58], RRM [45], BBLLI [21], and zero-DCE [68] methods. The LBR method splits the input image into front-lit and backlit images and is suitable only for backlit images. Other deep learning-based methods suffer from over-or-under enhancement problems. Such as Retinex produces texture amplification, and BBLI somehow suffers from under-enhancement problems. Similarly, the comparison in Fig. 7 demonstrates that the proposed method preserves better color and contrast. In Fig. 8, the overall image quality of the proposed method is closest to the color and contrast in the ground truth images. The comparison of our method with various competitor approaches on the color checker images with emo images in Fig. 7 depicts that the proposed method preserves more details to maintain a visual balance with brightness and contrast. The overall visual quality and results clearly illustrate that some of the competitor methods cannot restore the complete details in this extremely dark scenario. At the same time, others are unable to restore the colors and contrast. It is further elaborated in Fig. 7 and Fig. 9 where the zoomed-in patches depict the quality of the output images. Our method preserves the color, contrast, and boundaries which improve object detection in dark and degraded visual scenarios.

Fig. 6. Comparison on back-lit-model images from UXOV-dataset [14], with the various methods.

Download Full Size | PDF

Fig. 7. Comparison of the proposed method with various state-of-the-art approaches by using UXOV [14] and LOL [15] data set images.

Download Full Size | PDF

Fig. 8. Comparison of ours method with various methods on Adobe FiveK [62] dataset sample images.

Download Full Size | PDF

Fig. 9. Comparison of our method with various methods on extreme dark scenario data samples [15].

Download Full Size | PDF

Fig. 10. Comparison of proposed method (Ours) method with various state-of-the-art approaches on exclusive dark dataset [63] sample images .

Download Full Size | PDF

3.4 Detection in the dark scenarios

We are aware that the visual quality of the lowlight images often defies the detection algorithms. The results produced by the extant methods face several subjective and objective degradation and hardly represent the naturalness of the output images. The visual quality level might change due to personal choice. Similarly, adaptiveness in the change in brightness can improve the quality of machine vision-based face detection systems. In order to evaluate the performance of the detection and recognition in the dark scenarios, we perform the experiments for people detection in low lighting/extreme dark scenarios on 300 images randomly selected from the Exclusive dark dataset [63]. In this regard, we utilize YoloV3 [3] trained on COCO images [69] to illustrate the performance of the person detection task in combination with the proposed enhancement method. The results are shown in terms of Average Precision (mAP$\%$) and Average Recall ($\%$) with respect to several state-of-the-art approaches. The enhancement results of the HDR-Net [17], LIME [43], Retinex [15] and KinD-Net [16] D2D-Net (ours) are fed to the pre-trained YOLOV3 to distinguish the detection capability. The results in terms of mAP and Recall are shown in Table 4. The overall quantitative comparison of the competitor’s approaches in terms of mAP and average recall for person detection is shown in Fig. 11. The detection results in extreme dark scenarios depend heavily on the enhancement algorithms. In the case of uneven/exclusive dark scenarios, the worst noises hidden in the darkest areas get amplified during enhancement. The amplification of the objectionable artifacts (i.e., structure, texture, and boundaries) deteriorate the overall image quality and limit the performance of detection systems. It can be seen in Fig. 10 that our method has the highest detection rate. For example, Rerinex in the second row deteriorates the global image structure with texture amplification. Thus, it can not detect some of the objects in an extremely dark background (Zoom in to monitor and read the detection boxes ($\%$), best zoom at $300 \%$). In the last row of Fig. 10, the comparison of the people depicts that our method detects both objects but the competitor Zero-DCE detects one object. It depicts the difficulty of the detection in dark scenarios, where strong boundaries and edges contribute to improving the detection mechanism’s performance.

Fig. 11. Quantitative comparison of average precision and average recall values for the person detection.

Download Full Size | PDF

Table 4. Objective comparison of face detection in terms of average precision and average recall score.

View Table | View all tables in this article

3.5 Discussions

Most of the existing methods are designed in a heuristic manner without considering the consistency of the input data. In such cases, artifacts get amplified at every stage during the enhancement process. In contrast, we embed the knowledge of data decomposition in the network and refined decomposed components independently with the proposed weight function to obstruct the amplification of objectionable artifacts at subsequent enhancement stages.

It is important to note that the classic methods have a huge computational complexity. The complexity of deep learning-based methods also increases with large-scale and paired training datasets. Because gathering a large-scale dataset for network supervision or synthesizing the image pairs still have several disadvantages. Such as, if the lighting appointment level is missing in the training set, the learning behavior of the network for the camera response might not predict in unknown scenarios. Therefore more adaptive and free-style methods are imperative to handle such problems.

The proposed method considers these limitations and works without the imposition of the paired training data/careful and large-scale data-set selection. The overall visual comparison and performance in terms of objective metrics demonstrate that our method outperformed the state-of-the-art approaches. Moreover, it is more suitable in robust real-world scenarios, where degraded visual conditions pose several challenges to the detection and recognition system. The evaluation of object detection also illustrates the superiority of our method.

4. Conclusion

This work presents a realistic framework to improve the visual quality in degraded visual conditions (e.g., extreme dark background, lowlight, backlight, mist. etc.), with a balance of brightness and contrast. A new strategy is proposed and embedded in a simple yet effective network denoted as deep into darkness net (D2D-net) to promote brightness and extract contrast for the darkest regions. We separate the high-frequency reflection component from illumination in the input image with a multilayer subnet (i.e., deep-net). In addition, the structure, texture, and strong boundaries are preserved with subsequent illumination adjustments in another encoder-decoder-styled subnet (i.e., bright-net) under the guidance of the high-frequency component. The loss functions of the network are optimized in combination with the proposed weighted refinement. The network learns to decompose, refine and recompose images following proposed enhancement operations. The extant supervised and unsupervised methods rely on large-scale and even paired training datasets. The proposed method work irrespective of the type and quantity of training data. Our experiments were conducted on a variety of difficult benchmark datasets, yielding new state-of-the-art findings for image enhancement. The enhancement output is also tested for the person-detection in the darkest scenarios, where the proposed method achieves superior results.

Funding

National Natural Science Foundation of China (11871438); Natural Science Foundation of Zhejiang Province (LZ22F020010); Zhejiang Normal University Research Fund (ZC304022915).

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

References

1. C. Li, C. Guo, L.-H. Han, J. Jiang, M.-M. Cheng, J. Gu, and C. C. Loy, “Low-light image and video enhancement using deep learning: a survey,” in IEEE Transactions on Pattern Analysis & Mach. Intell., p. 1, (2021).

2. Y. J. Jung, “Enhancement of low light level images using color-plus-mono dual camera,” Opt. Express 25(10), 12029–12051 (2017). [CrossRef]

3. J. Redmon and A. Farhadi, “Yolov3: An incremental improvement,” arXiv preprint arXiv:1804.02767 (2018).

4. P. Wani, K. Usmani, G. Krishnan, T. O’Connor, and B. Javidi, “Lowlight object recognition by deep learning with passive three-dimensional integral imaging in visible and long wave infrared wavelengths,” Opt. Express 30(2), 1205–1218 (2022). [CrossRef]

5. N. Wang, B. Zheng, H. Zheng, and Z. Yu, “Feeble object detection of underwater images through lsr with delay loop,” Opt. Express 25(19), 22490–22498 (2017). [CrossRef]

6. W. Cho, J. Jang, A. Koschan, M. A. Abidi, and J. Paik, “Hyperspectral face recognition using improved inter-channel alignment based on qualitative prediction models,” Opt. Express 24(24), 27637–27662 (2016). [CrossRef]

7. E. H. Land and J. J. McCann, “Lightness and retinex theory,” J. Opt. Soc. Am. 61(1), 1–11 (1971). [CrossRef]

8. W. Wang, C. Zhang, and M. K. Ng, “Variational model for simultaneously image denoising and contrast enhancement,” Opt. Express 28(13), 18751–18777 (2020). [CrossRef]

9. N. A. Riza, J. P. La Torre, and M. J. Amin, “Caos-cmos camera,” Opt. Express 24(12), 13444–13458 (2016). [CrossRef]

10. G. Chen, L. Li, W. Jin, J. Zhu, and F. Shi, “Weighted sparse representation multi-scale transform fusion algorithm for high dynamic range imaging with a low-light dual-channel camera,” Opt. Express 27(8), 10564–10579 (2019). [CrossRef]

11. M. Nikolova and G. Steidl, “Fast hue and range preserving histogram specification: Theory and new algorithms for color image enhancement,” IEEE Trans. on Image Process. 23(9), 4087–4100 (2014). [CrossRef]

12. Y. Ueda, H. Misawa, T. Koga, N. Suetake, and E. Uchino, “Hue-preserving color contrast enhancement method without gamut problem by using histogram specification,” in 2018 25th IEEE International Conference on Image Processing (ICIP), (IEEE, 2018), pp. 1123–1127.

13. S. Ahn, J. Shin, H. Lim, J. Lee, and J. Paik, “Coden: combined optimization-based decomposition and learning-based enhancement network for retinex-based brightness and contrast enhancement,” Opt. Express 30(13), 23608–23621 (2022). [CrossRef]

14. R. Khan, Y. Yang, Q. Liu, J. Shen, and B. Li, “Deep image enhancement for ill light imaging,” J. Opt. Soc. Am. A 38(6), 827–839 (2021). [CrossRef]

15. C. Wei, W. Wang, W. Yang, and J. Liu, “Deep retinex decomposition for low-light enhancement,” in British Machine Vision Conference, (2018).

16. Y. Zhang, J. Zhang, and X. Guo, “Kindling the darkness: A practical low-light image enhancer,” in Proceedings of the 27th ACM international conference on multimedia, (2019), pp. 1632–1640.

17. M. Gharbi, J. Chen, J. T. Barron, S. W. Hasinoff, and F. Durand, “Deep bilateral learning for real-time image enhancement,” ACM Transactions on Graph. 36(4), 1–12 (2017). [CrossRef]

18. R. Liu, L. Ma, Y. Zhang, X. Fan, and Z. Luo, “Underexposed image correction via hybrid priors navigated deep propagation,” IEEE transactions on neural networks learning systemsPP (2021).

19. K. Liu and Y. Liang, “Underwater image enhancement method based on adaptive attenuation-curve prior,” Opt. Express 29(7), 10321–10345 (2021). [CrossRef]

20. C. Chen, Q. Chen, J. Xu, and V. Koltun, “Learning to see in the dark,” in Proceedings of the IEEE conference on computer vision and pattern recognition, (2018), pp. 3291–3300.

21. Y. Zhang, X. Guo, J. Ma, W. Liu, and J. Zhang, “Beyond brightening low-light images,” Int. J. Comput. Vis. 129(4), 1013–1037 (2021). [CrossRef]

22. Y. Jiang, X. Gong, D. Liu, Y. Cheng, C. Fang, X. Shen, J. Yang, P. Zhou, and Z. Wang, “Enlightengan: Deep light enhancement without paired supervision,” IEEE Trans. on Image Process. 30, 2340–2349 (2021). [CrossRef]

23. L. Sun, C. Tang, M. Xu, and Z. Lei, “Non-uniform illumination correction based on multi-scale retinex in digital image correlation,” Appl. Opt. 60(19), 5599–5609 (2021). [CrossRef]

24. R. Ma, Q. Gao, Y. Qiang, and K. Shinomori, “Robust categorical color constancy along daylight locus in red-green color deficiency,” Opt. Express 30(11), 18571–18588 (2022). [CrossRef]

25. K. Wu, Y. Yang, M. Yu, and Q. Liu, “Block-wise focal stack image representation for end-to-end applications,” Opt. Express 28(26), 40024–40043 (2020). [CrossRef]

26. K. Wu, Y. Yang, Q. Liu, and X.-P. Zhang, “Focal stack image compression based on basis-quadtree representation,” IEEE Transactions on Multimedia, (2022).

27. B. Guenter, N. Joshi, R. Stoakley, A. Keefe, K. Geary, R. Freeman, J. Hundley, P. Patterson, D. Hammon, and G. Herrera, “Highly curved image sensors: a practical approach for improved optical performance,” Opt. Express 25(12), 13010–13023 (2017). [CrossRef]

28. G. M. Schuster, D. G. Dansereau, G. Wetzstein, and J. E. Ford, “Panoramic single-aperture multi-sensor light field camera,” Opt. Express 27(26), 37257–37273 (2019). [CrossRef]

29. Y. Luo, J. Jiang, M. Cai, and S. Mirabbasi, “Cmos computational camera with a two-tap coded exposure image sensor for single-shot spatial-temporal compressive sensing,” Opt. Express 27(22), 31475–31489 (2019). [CrossRef]

30. K. Wang, Y. Qian, M. Ye, and Z. Luo, “Flexible focus function consisting of convex function and image enhancement filter,” Opt. Express 22(15), 18668–18687 (2014). [CrossRef]

31. Y. Zhu and G. D. Finlayson, “Matched illumination: using light modulation as a proxy for a color filter that makes a camera more colorimetric,” Opt. Express 30(12), 22006–22024 (2022). [CrossRef]

32. S. M. Pizer, E. P. Amburn, J. D. Austin, R. Cromartie, A. Geselowitz, T. Greer, B. ter Haar Romeny, J. B. Zimmerman, and K. Zuiderveld, “Adaptive histogram equalization and its variations,” Comput. vision, Graph. Image Process. 39(3), 355–368 (1987). [CrossRef]

33. H. Farid, “Blind inverse gamma correction,” IEEE Trans. on Image Process. 10(10), 1428–1433 (2001). [CrossRef]

34. C. Lee, C. Lee, and C.-S. Kim, “Contrast enhancement based on layered difference representation of 2d histograms,” IEEE Trans. on Image Process. 22(12), 5372–5384 (2013). [CrossRef]

35. H. Kim, S.-M. Choi, C.-S. Kim, and Y. J. Koh, “Representative color transform for image enhancement,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, (2021), pp. 4459–4468.

36. D. J. Jobson, Z.-u. Rahman, and G. A. Woodell, “A multiscale retinex for bridging the gap between color images and the human observation of scenes,” IEEE Trans. on Image Process. 6(7), 965–976 (1997). [CrossRef]

37. H.-G. Lee, S. Yang, and J.-Y. Sim, “Color preserving contrast enhancement for low light level images based on retinex,” in 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA), (IEEE, 2015), pp. 884–887.

38. A. N. Tikhonov and V. Y. Arsenin, “Solutions of ill-posed problems,” (1977).

39. X.-F. Liu, X.-R. Yao, R.-M. Lan, C. Wang, and G.-J. Zhai, “Edge detection based on gradient ghost imaging,” Opt. Express 23(26), 33802–33811 (2015). [CrossRef]

40. H. Alismail, B. Browning, and S. Lucey, “Robust tracking in low light and sudden illumination changes,” in 2016 Fourth International Conference on 3D Vision (3DV), (IEEE, 2016), pp. 389–398.

41. Z. Zou, Z. Shi, Y. Guo, and J. Ye, “Object detection in 20 years: A survey,” arXiv preprint arXiv:1905.05055 (2019).

42. X. Luo, S. Xiang, Y. Wang, Q. Liu, Y. Yang, and K. Wu, “Dedark+ detection: A hybrid scheme for object detection under low-light surveillance,” in ACM Multimedia Asia, (2021), pp. 1–5.

43. X. Guo, Y. Li, and H. Ling, “Lime: Low-light image enhancement via illumination map estimation,” IEEE Trans. on Image Process. 26(2), 982–993 (2017). [CrossRef]

44. R. Liu, L. Ma, J. Zhang, X. Fan, and Z. Luo, “Retinex-inspired unrolling with cooperative prior architecture search for low-light image enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2021), pp. 10561–10570.

45. M. Li, J. Liu, W. Yang, X. Sun, and Z. Guo, “Structure-revealing low-light image enhancement via robust retinex model,” IEEE Trans. on Image Process. 27(6), 2828–2841 (2018). [CrossRef]

46. C. Li, J. Guo, C. Guo, R. Cong, and J. Gong, “A hybrid method for underwater image correction,” Pattern Recognit. Lett. 94, 62–67 (2017). [CrossRef]

47. R. Khan, Y. Yang, Q. Liu, and Z. H. Qaisar, “A ghostfree contrast enhancement method for multiview images without depth information,” J. Vis. Commun. Image Represent. 78, 103175 (2021). [CrossRef]

48. Q. Wang, X. Fu, X.-P. Zhang, and X. Ding, “A fusion-based method for single backlit image enhancement,” in IEEE International Conference on Image Processing (IEEE, 2016), pp. 4077–4081.

49. J. W. Feifan Lv, Feng Lu, and C. Lim, “MBLLEN: Low-light image/video enhancement using cnns,” in Br. Mach. Vis. Conf., (2018).

50. D. Polevoy, E. Panfilova, E. Ershov, and D. Nikolaev, “Color correction of the document owner’s photograph image during recognition on mobile device,” in Thirteenth International Conference on Machine Vision, vol. 11605 (International Society for Optics and Photonics, 2021), p. 1160510.

51. X. Ren, M. Li, W.-H. Cheng, and J. Liu, “Joint enhancement and denoising method via sequential decomposition,” in IEEE International Symposium on Circuits and Systems (IEEE, 2018), pp. 1–5.

52. J. Zhou, X. Wei, J. Shi, W. Chu, and Y. Lin, “Underwater image enhancement via two-level wavelet decomposition maximum brightness color restoration and edge refinement histogram stretching,” Opt. Express 30(10), 17290–17306 (2022). [CrossRef]

53. J. Liu, Z. Liu, Y. Wei, and W. Ouyang, “Recovery for underwater image degradation with multi-stage progressive enhancement,” Opt. Express 30(7), 11704–11725 (2022). [CrossRef]

54. K. He, J. Sun, and X. Tang, “Single image haze removal using dark channel prior,” IEEE Trans. Pattern Anal. Mach. Intell. 33(12), 2341–2353 (2011). [CrossRef]

55. R. Khan, Y. Yang, Q. Liu, and Z. H. Qaisar, “Divide and conquer: Ill-light image enhancement via hybrid deep network,” Expert. Syst. with Appl. 182, 115034 (2021). [CrossRef]

56. R. Khan, Q. Liu, and Y. Yang, “A deep hybrid few shot divide and glow method for ill-light image enhancement,” IEEE Access 9, 17767–17778 (2021). [CrossRef]

57. F. Lv, Y. Li, and F. Lu, “Attention guided low-light image enhancement with a large scale low-light simulation dataset,” Int. J. Comput. Vis. 129(7), 2175–2193 (2021). [CrossRef]

58. Z. Li and X. Wu, “Learning-based restoration of backlit images,” IEEE Trans. on Image Process. 27(2), 976–986 (2018). [CrossRef]

59. R. Khan, A. Akram, and A. Mehmood, “Multiview ghost-free image enhancement for in-the-wild images with unknown exposure and geometry,” IEEE Access 9, 24205–24220 (2021). [CrossRef]

60. R. Wang, Q. Zhang, C.-W. Fu, X. Shen, W.-S. Zheng, and J. Jia, “Underexposed photo enhancement using deep illumination estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2019), pp. 6849–6857.

61. M. K. Ng and W. Wang, “A total variation model for retinex,” SIAM J. on Imaging Sci. 4(1), 345–365 (2011). [CrossRef]

62. V. Bychkovsky, S. Paris, E. Chan, and F. Durand, “Learning photographic global tonal adjustment with a database of input/output image pairs,” in CVPR 2011, (IEEE, 2011), pp. 97–104.

63. Y. P. Loh and C. S. Chan, “Getting to know low-light images with the exclusively dark dataset,” Comput. Vis. Image Underst. 178, 30–42 (2019). [CrossRef]

64. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE Trans. on Image Process. 13(4), 600–612 (2004). [CrossRef]

65. A. Mittal, R. Soundararajan, and A. C. Bovik, “Making a completely blind image quality analyzer,” IEEE Signal Process. Lett. 20(3), 209–212 (2013). [CrossRef]

66. W. Yang, S. Wang, Y. Fang, Y. Wang, and J. Liu, “From fidelity to perceptual quality: A semi-supervised approach for low-light image enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), pp. 3063–3072.

67. W. Wang, C. Wei, W. Yang, and J. Liu, “Gladnet: Low-light enhancement network with global awareness,” in IEEE International Conference on Automatic Face & Gesture Recognition, (IEEE, 2018), pp. 751–755.

68. C. Guo, C. Li, J. Guo, C. C. Loy, J. Hou, S. Kwong, and R. Cong, “Zero-reference deep curve estimation for low-light image enhancement,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, (2020), pp. 1780–1789.

69. T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European conference on computer vision, (Springer, 2014), pp. 740–755.

Methods	D2BI-Net	RRM	Retinex	JED	KinD	LBR	Ours
Average-PSNR	16.8674	13.9144	11.4625	13.3289	12.2019	15.8892	18.8763
Average-SSIM	0.8059	0.7719	0.71271	0.7659	0.7549	0.7870	0.8259

Methods	Data Dependency (Pairs and Priors)	PSNR	SSIM	LPIPS
LIME	Priors	16.75	0.56	0.73
KinD	Data Pairs	20.87	0.80	0.17
Retinex	Data Pairs	16.77	0.56	0.47
MBLLEN	Data type	13.93	0.49	0.69
Glad-Net	Data type	16.18	0.60	0.20
Enlighten-GAN	Data type	17.48	0.65	0.47
Zero-DCE	Data type	14.86	0.54	0.33
RUAS	Data Pairs	18.23	0.72	0.35
BBLLI	Data Pairs	21.30	0.82	0.17
D2D-Net (ours)	No pairs, No data type	22.77	0.84	0.14

Metrics	Retinex	GladNet	D2BI	KinD	BBLI	Ours
PSNR	20.8069	21.1232	19.3541	20.6758	21.9916	23.3827
SSIM	0.7658	0.7579	0.7129	0.7970	0.8010	0.8206
NIQE	5.1699	3.9426	4.3905	3.6569	3.5117	3.5327
LOE	461.9	525.7	498.8	957.6	562.3	419.63

Sr#	Methods	mAP	Avg Recall
1	HDR-Net	37.61	41.37
2	Retinex	39.47	45.18
3	LIME	54.35	73.22
4	Zero DCE	51.37	75.30
5	KinD	50.38	72.83
6	Ours	56.31	78.41

Methods	D2BI-Net	RRM	Retinex	JED	KinD	LBR	Ours
Average-PSNR	16.8674	13.9144	11.4625	13.3289	12.2019	15.8892	18.8763
Average-SSIM	0.8059	0.7719	0.71271	0.7659	0.7549	0.7870	0.8259

Robust contrast enhancement method using a retinex model with adaptive brightness for detection applications

Abstract

Corrections

1. Introduction

2. Methodology

2.1 Proposed deep into darkness strategy

2.2 Proposed network architecture

2.3 Loss adjustments

3. Datasets, experiments, and discussions

3.1 Datasets

3.2 Evaluation metrics

3.3 Subjective and objective comparison

3.4 Detection in the dark scenarios

3.5 Discussions

4. Conclusion

Funding

Disclosures

Data availability

References

Data availability

Cited By

Figures (11)

Tables (4)

Equations (7)

Optics Express