Hassan A. Sial, Ramon Baldrich, and Maria Vanrell, "Deep intrinsic decomposition trained on surreal scenes yet with realistic light effects," J. Opt. Soc. Am. A 37, 1-15 (2020)
Estimation of intrinsic images still remains a challenging task due to weaknesses of ground-truth datasets, which either are too small or present non-realistic issues. On the other hand, end-to-end deep learning architectures start to achieve interesting results that we believe could be improved if important physical hints were not ignored. In this work, we present a twofold framework: (a) a flexible generation of images overcoming some classical dataset problems such as larger size jointly with coherent lighting appearance; and (b) a flexible architecture tying physical properties through intrinsic losses. Our proposal is versatile, presents low computation time, and achieves state-of-the-art results.
You do not have subscription access to this journal. Cited by links are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Figure files are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Article tables are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
You do not have subscription access to this journal. Equations are available to subscribers only. You may subscribe either as an Optica member, or as an authorized user of your institution.
From left to right, we account for: number of images, GT perfectly fulfills the physical model, GT is on the full image or only a part, GT presents the influence of a diverse background, GT presents cast shadows apart from shading, and global image presents physically consistent lighting.
Meaning of special cases: ($ \star $) MIT dataset generally fulfills product model by including a factor, i.e., $ I = \alpha (R \cdot S) $, but does not completely hold for all images and has small deviation; (‡) Sintel dataset presents diverse backgrounds compared to the rest, but with a strong bias towards specific colors due to high correlation of a video sequence; and (†) training area is large, but still does not cover the full image.
Table 2.
Errors for Reflectance and Shading Predictions on Our Dataseta
Comparison between our IUI architecture and Retinex algorithm. IUI decreases the error of Retinex by the factor given in brackets. Errors are separately reported on object, on foreground, and on the whole image.
Table 3.
Estimation Errors on MIT Dataset Reported in Previous Works by Different Methods and for Our IUI Architecture
From left to right, we account for: number of images, GT perfectly fulfills the physical model, GT is on the full image or only a part, GT presents the influence of a diverse background, GT presents cast shadows apart from shading, and global image presents physically consistent lighting.
Meaning of special cases: ($ \star $) MIT dataset generally fulfills product model by including a factor, i.e., $ I = \alpha (R \cdot S) $, but does not completely hold for all images and has small deviation; (‡) Sintel dataset presents diverse backgrounds compared to the rest, but with a strong bias towards specific colors due to high correlation of a video sequence; and (†) training area is large, but still does not cover the full image.
Table 2.
Errors for Reflectance and Shading Predictions on Our Dataseta
Comparison between our IUI architecture and Retinex algorithm. IUI decreases the error of Retinex by the factor given in brackets. Errors are separately reported on object, on foreground, and on the whole image.
Table 3.
Estimation Errors on MIT Dataset Reported in Previous Works by Different Methods and for Our IUI Architecture