Unsupervised learning with a physics-based autoencoder for estimating the thickness and mixing ratio of pigments

Ryuta Shitomi; Mayuka Tsuji; Yuki Fujimura; Takuya Funatomi; Yasuhiro Mukaigawa; Tetsuro Morimoto; Takeshi Oishi; Jun Takamatsu; Katsushi Ikeuchi

doi:10.1364/JOSAA.472775

1. INTRODUCTION

Artistic works such as decorated tomb murals and watercolors, which are carved and painted, are of high historical value and reflect their time or show exceptional aesthetics. However, these historical heritages become depleted or damaged with time, even with proper preservation. In particular, fading of pigments is an unavoidable problem. Therefore, it is important to restore the original pigments that have faded and record their current state.

Such decorative tomb murals and watercolors have a two-layered structure: a pigment layer and a substrate. Some research attempted to reproduce the layered-surface color using currently available pigments. Abed [1] used a physical model based on the Kubelka–Munk (KM) theory [2] to estimate the mixing ratio of pigments using the nonnegative least-squares method. Since their target was oil paints, it is sufficient to estimate the pigment mixing ratio without considering the substrate reflectance. However, the colors of tomb murals and watercolors are affected by the substrate. Thus, we should consider substrate reflectance.

Morimoto et al. [3] modeled the absorption of light in the pigment layer and the reflection of the substrate using the Lambert–Beer law and separated the substrate and pigment layers by curve-fitting in RGB space. However, the Lambert–Beer law is too simple since it considers only the attenuation of light. KM theory may be suitable to model both the scattering and attenuation of light in the pigment layer for more detailed analysis.

Recently, deep learning techniques have improved the accuracy of various tasks in computer vision. Fukumoto et al. [4] trained a neural network using a dataset synthesized with various pigment mixing ratios and estimated the combination and concentration of oil paints. Shi et al. [5] used a neural network to estimate the 3D layout of pigments, and they reproduced the spectrum of an oil painting using multi-material 3D printing. However, their supervised learning of neural networks requires a large dataset. Without a large dataset, there is an approach to combine neural networks with differentiable physics-based models. This approach is particularly promising in computer graphics, such as 3D scene construction [6].

In this work, we estimate the thickness and the mixing ratios of pigments for layered surface objects. To estimate them, we use an unsupervised physics-based autoencoder with hyperspectral data. The encoder takes hyperspectral data as input for estimating the thickness and mixing ratio of pigments. The decoder reconstructs the input data from these latent values with KM theory. Since we use the physics-based model for the decoder, latent variables in the middle layer can be interpretable. Unlike previous studies, our design does not require a large dataset and enables unsupervised training.

2. RELATED WORK

A. Spectroscopy for Layered Surface Objects

We suppose a layered surface object such as decorated tomb murals and watercolors. Pigment mapping is a visualization technique of the spatial distribution of pigments used on a painting, which provides clues for its restoration. It utilizes hyperspectral imaging to estimate the mixing ratio of pigments pixel by pixel. The technique has two categories: optical-model-based methods [1,3,4,7–10] and those based on signal processing on the spectrum [11–16].

Optical models describing phenomena in the pigment layer are used to estimate the mixing ratio of pigments. A typical model is the Lambert–Beer law, which has attenuation in the pigment layer as a parameter. Another model is KM theory, which has scattering and absorption in the pigment layer as parameters. Kirchner et al. used KM theory to visualize the spectral image of Vincent van Gogh’s painting “Field with Irises Near Arles” by removing a layer of yellow varnish [17]. Furthermore, they estimated the concentration maps of pigments [18], analyzed how the colors of some pigments changed over time, and reconstructed the original colors of the painting [19]. Another study estimated the pigment concentrations by an encoder–decoder [4], which is one of the architectures of neural networks.

B. Physics-Based Deep Learning

Deep learning, especially convolutional neural networks, has achieved remarkable results in computer vision. For supervised learning, training a network requires a large amount of annotated data. However, annotation can be costly, and annotation itself can be difficult.

Some deep learning studies use physics-based models to solve problems of annotation, for example, 3D object reconstruction from only a single-view image [20] and 3D pose estimation using physics-based models [21,22]. In these studies, the middle layer of the neural network has physical parameters estimated from the input image, used to synthesize the image using a differentiable renderer. The neural network is trained so that the synthesized image is closer to the input image. In this study, we use a similar unsupervised pipeline to estimate the thickness and mixing ratio of pigments.

3. REFLECTANCE MODEL ON LAYERED SURFACE OBJECTS

Wall paintings of decorative tombs and watercolors are called layered surface objects because they consist of two layers: a pigment layer and a substrate formed by pigments. KM theory, which is a physics-based model that models both the scattering and absorption of light inside the pigment layer, is widely used for layered surface objects [2].

A. Kubelka-Munk Theory

In the model of reflected light based on KM theory, the following three conditions are assumed to simplify the discussion:

• the pigment layer is much broader than its thickness so that the effect of the edge is negligible;
• the pigment particles are optically homogeneous and uniformly distributed in the pigment layer;
• light that travels parallel to the substrate surface in the pigment layer is negligible.

Under these three assumptions, let the diffuse flux $i(\lambda)$ go from the pigment layer surface to the substrate and the diffuse flux $j(\lambda)$ go from the substrate to the pigment layer surface, as shown in Fig. 1. Let the thickness of the pigment layer be $X$, thickness of the micro-unit be $dx$, and distance from the substrate to the micro thickness be $x$. Note that in this paper, thickness refers to optical path length, which is a relative value. The change in the diffuse flux toward the substrate and the change in the diffuse flux toward the surface of the pigment layer passing through the pigment layer of micro-unit thickness is expressed by Eqs. (1) and (2):

Fig. 1. Light propagation modeled in KM theory. This two-light flux model shows light is observed on the surface while being affected by the pigment and substrate properties.

Download Full Size | PDF

(1)$$di(\lambda) = (S(\lambda) + K(\lambda))i(\lambda)dx - S(\lambda)j(\lambda)dx,$$

(2)$$dj(\lambda) = - (S(\lambda) + K(\lambda))j(\lambda)dx + S(\lambda)i(\lambda)dx.$$

$S(\lambda)$ and $K(\lambda)$ are pigment-specific scattering and absorption coefficients of the pigment layer, respectively. The scattering coefficient represents the rate of reflectance increase in a small thickness of the pigment layer. Note that we assume that the particle size of pigments is sufficiently large about the wavelength in this work. The absorption coefficient represents the rate of transmittance decrease in a small thickness of the pigment layer. Since KM theory simply ignores the light interference in absorption and scattering, we omit the argument $\lambda$ in this paper when the equation becomes complicated.

Solving the differential Eqs. (1) and (2) leads to the expression of the reflectance ${R_{{\rm KM}}}(X;{R_\infty},S,{R_b})$ of the pigment layer with thickness of $X$ as Eq. (3) using the substrate reflectance ${R_b}$ and pigment layer reflectance ${R_\infty}$ when $X = \infty$, and is not affected by the substrate:

(3)$$\begin{split}&{R_{{\rm KM}}}(X;{R_\infty},S,{R_b}) \\[-3pt]&\quad= \frac{{\frac{1}{{{R_\infty}}}({R_b} - {R_\infty}) - {R_\infty}\left({{R_b} - \frac{1}{{{R_\infty}}}} \right){e^{SX\left({\frac{1}{{{R_\infty}}} - {R_\infty}} \right)}}}}{{({R_b} - {R_\infty}) - \left({{R_b} - \frac{1}{{{R_\infty}}}} \right){e^{SX\left({\frac{1}{{{R_\infty}}} - {R_\infty}} \right)}}}}.\end{split}$$

Note that Eq. (3) gives the substrate reflectance ${R_b}$ when $X = 0$.

We can express the pigment reflectance ${R_\infty}$ using the scattering coefficient $S$ and absorption coefficient $K$ as Eq. (4):

(4)$${R_\infty}(S,K) = 1 + \frac{K}{S} - \sqrt {{{\left({\frac{K}{S}} \right)}^2} + 2\frac{K}{S}} .$$

Equation (4) shows that we can obtain the reflectance ${R_\infty}$ when pigment-specific scattering and absorption coefficients are given. Measuring the scattering and absorption coefficients directly from the pigments is difficult. Some methods have been proposed to derive them from measured reflectance [23] and user input [24]. We followed [23] to calculate the scattering and absorption coefficients of the pigments used in the experiments.

B. Pigment Mixing Theory

Duncan [25] defined the additivity of scattering and absorption coefficients in making mixed pigments from $n$ primary pigments. Let the scattering coefficients of primary pigments be ${S^1},{S^2}, \cdots ,{S^n}$, absorption coefficients of primary pigments be ${K^1},{K^2}, \cdots ,{K^n}$, and mixing ratio of primary pigments be ${C^1},{C^2}, \cdots ,{C^n}$, respectively. The scattering and absorption coefficients of the mixed pigments are ${S^{{\rm mix}}}$ and ${K^{{\rm mix}}}$, respectively. By using these variables, we can obtain Eq. (5):

(5)$$\frac{{{K^{{\rm mix}}}}}{{{S^{{\rm mix}}}}} = \frac{{{C^1}{K^1} + {C^2}{K^2} + \ldots + {C^n}{K^n}}}{{{C^1}{S^1} + {C^2}{S^2} + \ldots + {C^n}{S^n}}},$$

where $\sum\nolimits_{i = 1}^n {C^i} = 1$.

4. ANALYSIS BY PHYSICS-BASED AUTOENCODER

In this section, we describe a physics-based autoencoder for unsupervised learning of the thickness and mixing ratio using hyperspectral images of layered surface objects as input.

A. Problem Setting

When a hyperspectral image of layered surface objects is measured, each pixel contains a mixture of two spectral reflectances, one from the pigment layer and the other from the substrate layer.

Let $d$ be the band of the hyperspectral image and $n$ be the number of pigments in the mixture. By KM theory, the spectral reflectance of each pixel in the hyperspectral image is composed of the spectral reflectance of the substrate ${R_b}$, and the pigment-derived variables: scattering coefficient ${S^i}$, absorption coefficient ${K^i}$, mixing ratio ${C^i}(i = 1,...,n)$, and thickness of the pigment layer $X$. ${R_b}$, ${S^i}$, and ${K^i}$ are defined as follows:

(6)$${R_b} = ({R_b}({\lambda _1}),{R_b}({\lambda _2}),\ldots,{R_b}({\lambda _d}{))^ \intercal},$$

(7)$${S^i} = ({S^i}({\lambda _1}),{S^i}({\lambda _2}),\ldots,{S^i}({\lambda _d}{))^ \intercal},$$

(8)$${K^i} = ({K^i}({\lambda _1}),{K^i}({\lambda _2}),\ldots,{K^i}({\lambda _d}{))^ \intercal}.$$

In this study, we consider the problem of estimating the thickness and mixing ratio on each pixel from the spectral reflectance, assuming that ${R_b}$, ${S^i}$, and ${K^i}$ are known. In the case of tumulus mural paintings and watercolors, we can suppose that component analysis of pigments has already clarified what kind of pigments are used.

There are several difficulties in adopting a supervised learning framework for estimating the thickness and mixing ratio of pigments. One difficulty is in the annotation of thicknesses and mixing ratios corresponding to the measured spectral reflectance. Although making many samples with various mixing ratios is possible, as [4] did, creating a sample with known thicknesses is unrealistic. The dataset size sufficient for training is not trivial as well. Another idea is to create training data by simulation. However, there is a discrepancy between simulated data and measured data, and the estimation error in measured data becomes large.

In this study, we propose a method for training and estimating the thickness and mixing ratio in an unsupervised framework. This allows the estimation of measured data without annotated thickness data.

B. Physics-Based Autoencoder

The goal of our physics-based autoencoder is to solve the inverse problem of the KM model. Figure 2 shows the overview of our physics-based autoencoder. First, the spectrum of an arbitrary pixel is passed through a neural network and encoded into the thickness and mixing ratio of the pixel. Then, the thickness and mixing ratio from the encoder is input to the decoder, a model based on KM theory, and the decoder reconstructs the spectrum. Finally, the encoder is optimized with the error between the input and output spectra.

Fig. 2. Overview of our physics-based autoencoder. This autoencoder takes the form of inputting spectral data $R$ at a pixel in the spectral image and restoring the input spectrum $\hat R$. The latent variables are physically interpretable parameters: pigment thickness $\hat X$ and pigment mixing ratio $\hat C$.

Download Full Size | PDF

In encoder ${f_\Theta}$, which is a multilayer perceptron, $\Theta$ is the set of weights to be trained by the encoder. ${f_\Theta}$ takes the spectrum $R \in {\mathbb{R}^d}$ of each pixel of the spectral image as input and outputs the estimates of thickness $\hat X \in \mathbb{R}$ and the mixing ratio $\hat C \in {\mathbb{R}^n}$:

(9)$$(\hat X,\hat C) = {f_\Theta}(R).$$

In our implementation, the encoder consists of seven fully connected layers with the rectified linear unit (ReLU) as the activation function, and the number of units in each layer is 300. As the activation functions of the final layer, we use a ReLU function to force the thickness $\hat X$ to be positive and use a Softmax function to normalize the pigment mixing ratios $\hat C$ so that the sum is one.

In the decoder, it receives $\hat X$ and $\hat C$ from the encoder outputs. Using $\hat X$ and $\hat C$, the decoder calculates mixed pigment parameters, scattering coefficient ${S^{{\rm mix}}}$ and absorption coefficient ${K^{{\rm mix}}}$:

(10)$${S^{{\rm mix}}}(\hat C) = \sum\limits_{i = 1}^n {\hat C^i}{S^i},$$

(11)$${K^{{\rm mix}}}(\hat C) = \sum\limits_{i = 1}^n {\hat C^i}{K^i}.$$

By substituting ${S^{{\rm mix}}}$ and ${K^{{\rm mix}}}$ into Eq. (4), we calculate the spectral reflectance $R_\infty ^{{\rm mix}}$ of only the pigment layer where the thickness of the mixed pigment is thick enough and is not affected by the substrate:

(12)$$R_\infty ^{{\rm mix}} = {R_\infty}({S^{{\rm mix}}}(\hat C),{K^{{\rm mix}}}(\hat C)).$$

Finally, the decoder reconstructs spectrum $\hat R$, which is a mixture of the spectral reflectance of the pigment layer and that of the substrate, using ${S^{{\rm mix}}}$ and $R_\infty ^{{\rm mix}}$ calculated from the output of the encoder, the thickness $\hat X$ estimated by the encoder, and the spectral reflectance ${R_b}$ of the substrate:

(13)$$\hat R(\hat X,\hat C) = {R_{{\rm KM}}}(\hat X;R_\infty ^{{\rm mix}}(\hat C),{S^{{\rm mix}}}(\hat C),{R_b}).$$

In the optimization of the encoder weights, we define the error function as follows:

(14)$$L = \sum\limits_{R \in {\cal R}} \big\| {\hat R({f_\Theta}(R)) - R} \big\|_2^2 + {\lambda _{{\rm reg}}}\sum\limits_{\theta \in \Theta} \left\| \theta \right\|_2^2,$$

where ${\cal R}$ is the set of input spectra to be processed. The first term of the error function evaluates the reconstruction error between the input and reconstructed spectra. The second term is a regularizer for the encoder weights. Using this error function, we train the encoder weights by the gradient method algorithm. To obtain the gradient weight for the encoder, we use the error backpropagation method. It requires the decoder to be differentiable, which is achieved by the automatic differentiation technique [26].

In this way, the encoder learns the thickness and mixing ratio of the pigment layer by the difference between the input spectrum and reconstructed spectrum. We used Adam [27] as the optimization algorithm.

5. EXPERIMENT

To evaluate the effectiveness of our method, we train the autoencoder on synthetic and measured data.

A. Quantitative Evaluation Using Synthetic Data

We quantitatively evaluate the performance of our model. As discussed in Section 4.A, creating a sample with known thicknesses is difficult. Thus, we simulate the spectral reflectance of a layered surface object based on KM theory with the known pigments’ thickness and mixing ratio. Then, we estimate them using our model and evaluate their accuracy.

1. Dataset

Figure 3(a) visualizes the RGB image of the synthesized spectral data, which has 81 bands in 400–700 nm, used in this experiment. The resolution of the image is 170*210. This means that we have trained on 35,700 data.

Fig. 3. Overview of synthetic data. (a) RGB image from spectral data. Data are synthesized based on the maps of (b) thickness and (c)–(e) mixing ratios of yellow, blue, and purple, respectively.

Download Full Size | PDF

We synthesize the spectral image based on KM theory with the scattering and absorption coefficients of yellow, blue, and purple pigments. Figure 4 shows the spectral distributions of ${R_b}$, ${R_\infty}$, and ${S^i}$ of the pigments used for the synthesis. We prepared the maps of thickness $X(x,y)$ and mixing ratios $C(x,y)$ as presented in Figs. 3(b)–3(e). There are four different pigmented areas in the figure, and the rest of the figure is only the substrate with zero pigment thickness. The three areas on the left, in order from left to right, are monochromatic areas with pure yellow, blue, and purple. The fourth (rightmost) area is a mixture of yellow and blue with mixing ratios of 0.8 and 0.2, respectively. The thickness of the pigmented areas increases toward the bottom as shown in Fig. 3(b). Note that the maximum thickness varies among the pigmented regions because the thickness that can completely obscure the substrate is different for each pigment according to the ratio of scattering and absorption.

Fig. 4. Set of pigments’ ${R_\infty}$ and $S$ used for synthesis.

Download Full Size | PDF

We added Gaussian noise with mean zero and standard deviation 0.003 to each band of the spectrum of each pixel. We train the autoencoder with the synthetic data until the error function converges to a fixed value.

2. Quantitative Evaluation

We evaluated the estimates of the thickness $\hat X(x,y)$ and mixing ratios $\hat C(x,y)$, which are obtained as the output of the encoder.

Figures 5 and 6 show a comparison of thicknesses $X(x,y)$ and $\hat X(x,y)$ for each pigmented area. Figure 5 shows it in a visual map, and Fig. 6 shows it in a graph. In Fig. 6, the horizontal axis is the value used for synthesis, and the vertical axis is the estimated value. $X(x,y)$ and $\hat X(x,y)$ are almost the same, indicating our method correctly estimates thickness. Except for purple, the estimation error is larger where the pigment is thicker. As formulated in Eq. (3), the reflectance converges to ${R_\infty}$ as the thickness becomes large. Therefore, the trend that appears in Fig. 6 is reasonable because the estimated error in the thickness has little impact on the spectrum when the pigment is thick. This is the limitation of our physics-based method.

Figure 7 shows the estimates of the mixing ratios. Our method correctly estimates the mixing ratio of each pigmented area. However, in the upper part of the area where the pigments are thin, the mixing ratios differ from the one presented in Figs. 3(c)–3(e). This is because in regions where the pigments are thin, the spectrum of the substrate dominates and the errors in the mixing ratios do not affect the spectrum so much. Although our method failed to estimate the mixing ratios for the area with only the substrate, it does not matter since the estimated thickness is correctly estimated as almost zero.

B. Estimation for Measured Data (Pigments of Decorated Tomb Murals)

Although the synthetic data enable quantitative evaluation, there may exist a discrepancy between the simulated data and measured data as discussed in Section 4.A. Therefore, we conducted an experiment using measured data to qualitatively evaluate the estimated thickness.

Controlling the mixture of some pigments is possible; however, pigments are not mixed in tomb murals [28]. Thus, we used only an object painted with pure pigments but allowed the model to estimate the mixing ratio of the pigment.

Fig. 5. (a) Original thickness $X(x,y)$ and (b) estimated thickness $\hat X(x,y)$. As can be seen from the mean abosulte error map (c), the thicker the thickness, the more the error tends to increase.

Download Full Size | PDF

Fig. 6. Thickness estimation results from synthetic data are plotted together with the original. The better the approximation to the linear line of $y = x$, the higher the accuracy.

Download Full Size | PDF

Fig. 7. Estimation results of mixing ratios on synthetic data.

Download Full Size | PDF

1. Dataset

We prepared samples of three pigments applied to fine paper: yellow (Inari-Oudo), red (Bengara), and light blue (Ainezumi), which are some primary pigments used for the wall paintings of tomb murals. In this sample, we did not mix the pigments but applied each pigment as it is. To apply enough to the paper, we added glue solutions.

We measured the samples’ spectra using a hyperspectral camera (SOC710-VP, Surface Optics). This hyperspectral camera measures 128 bands from 380 to 1080 nm, but we used data from 70 bands between 440 and 810 nm for this experiment. The light source in the environment was filament lamps, and the spectral data were normalized using a white reflectance standard.

Figure 8(a) shows an RGB image converted from the spectral data. The resolution of the image is 520*410. This means that we have trained on 213,200 data. The colors of the lines are yellow, red, light blue, yellow, red, and light blue, in that order. As can be seen from Fig. 8(a), the thickness of the pigments varies because of the brush stroke.

Fig. 8. (a) RGB images converted from an original measured spectral image ${\textbf R}$. (b) RGB images converted from a reconstructed spectral image $\hat R$. (c) Mean squared errors in (a) and (b) spectra, which have a theoretical maximum value of 1.0. (d) Estimation results of pigment thickness $\hat X$. (e) Simulation results when the thickness is tripled based on the results in (d).

Download Full Size | PDF

2. Evaluation of the Results

Figure 8(b) shows the RGB image converted from the reconstructed spectra by our autoencoder, and Fig. 8(c) shows the mean squared error of (a) and (b) spectra. The reconstructed image is similar to the measured image [Fig. 8(a)]. However, the errors tend to be larger at pigment edges. At the edges, we suspect that there are some phenomena that do not fit the KM model. For example, the too-thick pigment may have changed the normal direction.

Figure 8(d) shows the estimated thickness of the pigments. Since the range of thickness varies among pigments, Fig. 9(d) shows the relative variations of the estimated thickness in each of the pigments. Each column corresponds to an area indicated in Fig. 8(b). By referring to Figs. 8(a) and 8(d), we can see that the estimated thickness reflects the variation of pigments caused by the brush stroke.

Fig. 9. Magnification of images and the estimated thickness [see Fig. 8(b)]. (a) Measured image $R$. (b) Reconstructed image $\hat R$. (c) Average mean error of spectrum. (d) Estimated thickness $\hat X$. (e) Enhanced image with three times thickness.

Download Full Size | PDF

Figure 10 shows the estimated mixing ratios of the pigments. This result indicates that our autoencoder discriminates the region of each pigment. In the non-pigmented region that exposes the substrate, the estimated mixing ratios exhibit almost 100% of light blue. Our guess is that the paint was not completely dry, which may have caused the specular reflection component to be measured a little. As discussed in the experiment with synthetic data, mixing ratios do not cause any problems since the thickness is estimated as zero correctly. A nonnegligible error is found in the red-pigmented area. Our encoder incorrectly estimated it as a mixture of light blue. The reason is the influence of the glue solution. The glue solution is transparent but not completely colorless, and thus has a little spectral characteristic. In areas where the pigment is thick, we guess that some glue solution does not evaporate but forms a layer that affects the spectrum.

Fig. 10. Estimated mixing ratios for measured data.

Download Full Size | PDF

Figure 8(d) demonstrates the enhanced image when the thickness is three times as thick as the estimated value. Layered surface objects such as decorative tomb murals and watercolors may be affected by natural degradation so that the pigment thickness becomes thin. Due to the spectral analysis based on KM theory, our method enables the synthesis of its original colors before degradation. This result shows the potential for applications in digital preservation such as e-Heritage by Ikeuchi et al. [28].

C. Estimation for Measured Data (Pigments of Watercolors)

Unlike the pigments of tomb murals, those of watercolors are assumed to be mixed. Therefore, we conducted a quantitative evaluation of thickness and a quantitative one of the mixing ratio using measured data of mixed watercolors.

1. Dataset

We prepared two samples and measured each spectrum. Both were created by mixing Pentel watercolor pigments on a palette and applying them to fine paper. The setup for spectral measurements is the same as in Section 5.B.1. Table 1 shows the kinds of pigment colors used in each sample. Figure 11 shows RGB images converted from each spectral data. Each pigment area shows a mixing ratio. The resolutions of the spectral images are 440*390 for sample 1 and 680*465 for sample 2. That means we have trained with 171,600 data for sample 1 and 316,200 data for sample 2.

Fig. 11. RGB images from spectral data. Each pigment area has its own unique mix ratio.

Download Full Size | PDF

2. Evaluation of the Results of Sample 1

Figure 12(b) shows the RGB image converted from the reconstructed spectrum by our autoencoder. Figure 12(c) shows the mean squared error of (a) and (b) spectra. The reconstructed image is similar to the measured image. Where the cobalt blue line and vermilion lines overlap, thickness is estimated to be thicker. That is rational when considering the overlap of the two pigments.

Fig. 12. Estimation result for sample 1. (a) RGB images converted from an original measured spectral image $R$. (b) RGB images converted from a reconstructed spectral image $\hat R$. (c) Mean squared errors in (a) and (b) spectra, which have a theoretical maximum value of 1.0. (d) Estimation results of pigment thickness $\hat X$.

Download Full Size | PDF

Table 1. Pigments Used for Samples

View Table

Figure 13 shows the estimated mixing ratios of the pigments. Although the estimation is generally successful, the upper left circular region has an error that contains a small amount of blue. We think that this is because of measurement errors in the derivation for the scattering and absorption coefficients for monochromatic pigments. As shown in Fig. 13(b), the estimation is successful even when the number of pigment color options is larger than the original number of colors. This result indicates the generality of this method.

Fig. 13. Estimated mixing ratios for sample 1: (a) with only used pigments and (b) with pigments including other than those used.

Download Full Size | PDF

Fig. 14. Estimation result for sample 2. (a) RGB images converted from an original measured spectral image $R$. (b) RGB images converted from a reconstructed spectral image $\hat R$. (c) Mean squared errors in (a) and (b) spectra, which have a theoretical maximum value of 1.0. (d) Estimation results of pigment thickness $\hat X$.

Download Full Size | PDF

3. Evaluation of the Results of Sample 2

Figure 14(b) shows the RGB image converted from the reconstructed spectrum by our autoencoder. Figure 14(c) shows the mean squared error of (a) and (b) spectra. The reconstructed image is similar to the measured image. We defined the paper spectra as the ${R_b}$ (constant) in the KM model, which leads to errors in the reconstructed spectrum [as shown in Fig. 16(a)]. However, considering the that purpose of this study is to analyze pigments, the errors in the paper area can be ignored since the thickness of the pigment is correctly estimated to be zero.

Fig. 15. Estimated mixing ratios for sample 2.

Download Full Size | PDF

Figure 15 shows the estimated mixing ratios of the pigments. Line areas (pigments are not mixed) are estimated correctly, but circle areas (pigments are mixed) have errors. Figure 16(b) shows the spectrum data between an original and a reconstruction of the upper left circle. This indicates that the spectrum is almost the same even if the pigment mixing ratio is wrong. Watercolors follow the theory of subtractive color mixing, in which many colors are mixed to produce a dark color with low saturation. In this case, the visible light spectrum is flat with few characteristics. Therefore, different pigment combinations can produce similar dark colors. The results of sample 2 indicate the limitations of our physics-based autoencoder. Since we are using spectra as input, the many-to-one relationship of the spectra gives an error in the estimation.

Fig. 16. Comparison of measured and reconstructed spectra for sample 2. (a) Comparison of a point in the paper area, showing a linear dependent-like relationship. (b) Comparison of spectra of a point in the upper left circle, showing similar spectra.

Download Full Size | PDF

D. Comparison with Supervised Learning

Finally, to justify the discussion in Section 4.A, we compare our model with a model trained in a supervised manner using synthetic data.

1. Supervised Learning Method

We trained an encoder ${f_\Theta}$ in a supervised manner using simulated spectral reflectance as the input and corresponding thickness and mixing ratios as the supervision. As the error functions, we use the squared error and cross-entropy loss for the thickness and mixing ratios, respectively.

2. Training Dataset

For training data, we synthesized the dataset of spectral reflectance using the measured spectral reflectance of the substrate ${R_b}$ and the scattering and absorption coefficients $(S,K)$ of yellow, red, and light blue. We set the mixing ratio of each pigment in the range [0,1] with a step of 0.01. We also set the thickness of each pigment in the range [0,5] with a step of 0.1. The total size of the training dataset was 262,701.

For test data, we used the same data as in Section 5.B.1 to compare the result with our unsupervised learning.

3. Comparison

Figure 17 shows the estimated thickness and mixing ratios for the test data. We also reconstructed the spectral image from the estimated parameters based on KM theory. Compared with the measured image in Fig. 8(a), the overall result of the reconstruction is not good, particularly in the upper part that exposes the substrate, and in the lower area of light blue.

Fig. 17. Estimation results from supervised learning.

Download Full Size | PDF

There are several differences between our model and the model of supervised learning. First, the design of the losses is different. Our model evaluates the error in the spectra, whereas the compared model does for the thickness and mixing ratios. Although the latter needs to balance between the parameters with different physical meanings, it is not trivial. Second, although supervised learning needs a prior probability distribution of the thickness and mixing ratios for synthesizing the training dataset, it would be different from the one behind the input. Since our approach optimizes the parameters directly on the input, our method is free of such an out-of-distribution gap. As a result, our method can achieve much better results trained from 213,200 pixels of measured data, which is less than the number of training samples of supervised learning.

6. CONCLUSION AND FUTURE WORK

A. Conclusion

In this study, we proposed a method of unsupervised learning for estimating the thickness and mixing ratios of pigments for layered surface objects. The significance of this research is that we show the effectiveness of combining analytical models with deep learning and solving inverse problems in physical models. We adopted a physics-based model into the decoder of the autoencoder: the light reflection model based on KM theory. Our autoencoder has interpretable latent variables at the intermediate layer as the thickness and mixing ratios. We confirmed the effectiveness of our method with synthetic data and a hyperspectral image of a real object. We also verified the superiority of our method by comparing the model with supervised learning using data generated according to the physics-based model.

B. Limitation and Future Work

There are some limitations and future work.

The accuracy of the estimates depends on the physical information available from the spectra since this study takes a physics-based approach. If the pigment thickness is large or subtractive mixing produces a neutral color, it becomes more difficult to obtain cues for estimation. If the pigment thickness is large, it becomes more difficult to obtain cues for estimation. Also if different pigment combinations can produce similar spectra, errors occur in estimating mixing ratios.

The scattering and absorption coefficients of the primary pigments need to be known. The errors in these coefficients may affect the accuracy of the estimation of the thickness and mixing ratios. Since they are obtained by measurement, errors are inevitable. Thus, adjusting the coefficients while estimating may contribute to improving the accuracy of the thickness and mixing ratios estimation.

As a physics-based model of the decoder, we used KM theory with two light fluxes. This model is a simple optical model as a direct problem, and we would adopt the more complex model in the future. A more complicated model such as four light fluxes [29] may improve the accuracy of the estimation. Also, if the particles are smaller and the scattering is more complex, it would be necessary to use advanced models with radiation propagation, such as Maxwell’s equations [30] and the radiative transfer method [31].

We used an autoencoder with spectral data as input. We would improve the model’s generalization capacity by adding other unannotated data. That is also our future work.

Disclosures

The authors declare no conflicts of interest.

Data availability

Data underlying the results presented in this paper are not publicly available at this time but may be obtained from the authors upon reasonable request.

REFERENCES

1. F. M. Abed, “Pigment identification of paintings based on Kubelka-Munk theory and spectral images,” Ph.D. thesis (Rochester Institute of Technology, 2014).

2. P. Kubelka and F. Munk, “An article on optics of paint layers,” Z. Tech. Phys. 12, 259–274 (1931).

3. T. Morimoto, R. T. Tan, R. Kawakami, and K. Ikeuchi, “Estimating optical properties of layered surfaces using the spider model,” in IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010), pp. 207–214.

4. K. Fukumoto, N. Tsumura, and R. Berns, “Estimating pigment concentrations from spectral images using an encoder-decoder neural network,” J. Imaging Sci. Technol. 64, 30502 (2020). [CrossRef]

5. L. Shi, V. Babaei, C. Kim, M. Foshey, Y. Hu, P. Sitthi-Amorn, S. Rusinkiewicz, and W. Matusik, “Deep multispectral painting reproduction via multi-layer, custom-ink printing,” ACM Trans. Graph. 37, 271 (2018). [CrossRef]

6. H. Kato, D. Beker, M. Morariu, T. Ando, T. Matsuoka, W. Kehl, and A. Gaidon, “Differentiable rendering: A survey,” arXiv, arXiv:2006.12057 (2020).

7. A. M. N. Taufique and D. W. Messinger, “Hyperspectral pigment analysis of cultural heritage artifacts using the opaque form of Kubelka-Munk theory,” Proc. SPIE 10986, 1098611 (2019). [CrossRef]

8. S. Lyu, D. Meng, M. Hou, S. Tian, C. Huang, and J. Mao, “Nonlinear mixing characteristics of reflectance spectra of typical mineral pigments,” Minerals 11, 626 (2021). [CrossRef]

9. C. Clementi, C. Miliani, G. Verri, S. Sotiropoulou, A. Romani, B. G. Brunetti, and A. Sgamellotti, “Application of the Kubelka—Munk correction for self-absorption of fluorescence emission in carmine lake paint layers,” Appl. Spectrosc. 63, 1323–1330 (2009). [CrossRef]

10. K. A. Dooley, D. M. Conover, L. D. Glinsman, and J. K. Delaney, “Complementary standoff chemical imaging to map and identify artist materials in an early Italian renaissance panel painting,” Angew. Chem. 126, 13995–13999 (2014). [CrossRef]

11. N. Pan, M. Hou, S. Lv, Y. Hu, X. Zhao, Q. Ma, S. Li, and A. Shaker, “Extracting faded mural patterns based on the combination of spatial-spectral feature of hyperspectral image,” J. Cult. Herit. 27, 80–87 (2017). [CrossRef]

12. H. Deborah, S. George, and J. Y. Hardeberg, “Spectral-divergence based pigment discrimination and mapping: A case study on the scream (1893) by Edvard Munch,” J. Am. Inst. Conserv. 58, 90–107 (2019). [CrossRef]

13. S. Baronti, A. Casini, F. Lotti, and S. Porcinai, “Multispectral imaging system for the mapping of pigments in works of art by use of principal-component analysis,” Appl. Opt. 37, 1299–1309 (1998). [CrossRef]

14. S. Mosca, R. Alberti, T. Frizzi, A. Nevin, G. Valentini, and D. Comelli, “A whole spectroscopic mapping approach for studying the spatial distribution of pigments in paintings,” Appl. Phys. A 122, 815 (2016). [CrossRef]

15. C. Balas, G. Epitropou, A. Tsapras, and N. Hadjinicolaou, “Hyperspectral imaging and spectral classification for pigment identification and mapping in paintings by El Greco and his workshop,” Multimedia Tools Appl. 77, 9737–9751 (2018). [CrossRef]

16. J. K. Delaney, K. A. Dooley, A. Van Loon, and A. Vandivere, “Mapping the pigment distribution of Vermeer’s girl with a pearl earring,” Herit. Sci. 8, 4 (2020). [CrossRef]

17. E. Kirchner, I. van der Lans, F. Ligterink, E. Hendriks, and J. Delaney, “Digitally reconstructing Van Gogh’s Field with Irises Near Arles. Part 1: varnish,” Color Res. Appl. 43, 150–157 (2018). [CrossRef]

18. E. Kirchner, I. van Der Lans, F. Ligterink, M. Geldof, A. Ness Proano Gaibor, E. Hendriks, K. Janssens, and J. Delaney, “Digitally reconstructing Van Gogh’s Field with Irises Near Arles. Part 2: pigment concentration maps,” Color Res. Appl. 43, 158–176 (2018). [CrossRef]

19. E. Kirchner, I. van der Lans, F. Ligterink, M. Geldof, L. Megens, T. Meedendorp, K. Pilz, and E. Hendriks, “Digitally reconstructing Van Gogh’s Field with Irises Near Arles. Part 3: determining the original colors,” Color Res. Appl. 43, 311–327 (2018). [CrossRef]

20. S. Tulsiani, T. Zhou, A. A. Efros, and J. Malik, “Multi-view supervision for single-view reconstruction via differentiable ray consistency,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2017), pp. 2626–2634.

21. G. Pavlakos, L. Zhu, X. Zhou, and K. Daniilidis, “Learning to estimate 3D human pose and shape from a single color image,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2018), pp. 459–468.

22. F. Bogo, A. Kanazawa, C. Lassner, P. Gehler, J. Romero, and M. J. Black, “Keep it SMPL: Automatic estimation of 3D human pose and shape from a single image,” in European Conference on Computer Vision (2016), pp. 561–578.

23. P. Kubelka, “New contributions to the optics of intensely light-scattering materials. Part I,” J. Opt. Soc. Am. 38, 448–457 (1948). [CrossRef]

24. C. J. Curtis, S. E. Anderson, J. E. Seims, K. W. Fleischer, and D. H. Salesin, “Computer-generated watercolor,” in Proceedings of the 24th Annual Conference on Computer Graphics and Interactive Techniques (1997), pp. 421–430.

25. D. R. Duncan, “The colour of pigment mixtures,” Proc. Phys. Soc. 52, 390 (1940). [CrossRef]

26. A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in Pytorch,” in NIPS Workshop on Autodiff (2017).

27. D. P. Kingma and J. Ba, “Adam: a method for stochastic optimization,” arXiv, arXiv:1412.6980 (2014).

28. K. Ikeuchi, T. Morimoto, M. Kamakura, N. Kuchitsu, K. Kawano, and T. Ikeda, “Kyushu decorative tumuli project: from e-heritage to cyber-archaeology,” Int. J. Comput. Vis. 130, 1609–1626 (2022). [CrossRef]

29. L. Simonot, R. D. Hersch, M. Hébert, and S. Mazauric, “Multilayer four-flux matrix model accounting for directional-diffuse light transfers,” Appl. Opt. 55, 27–37 (2016). [CrossRef]

30. A. Egel, K. M. Czajkowski, D. Theobald, K. Ladutenko, A. S. Kuznetsov, and L. Pattelli, “Smuthi: a Python package for the simulation of light scattering by multiple particles near or between planar interfaces,” J. Quant. Spectrosc. Radiat. Transfer 273, 107846 (2021). [CrossRef]

31. T. Väisänen, J. Markkanen, A. Penttilä, and K. Muinonen, “Radiative transfer with reciprocal transactions: numerical method and its implementation,” PLoS One 14, e0210155 (2019). [CrossRef]

	Colors Used among Pentel’s Watercolors
Sample 1	Yellow, viridian, cobalt blue, vermilion
Sample 2	Prussian blue, lemon yellow, red, yellow green, brown, viridian, cobalt blue, vermilion, yellow

Unsupervised learning with a physics-based autoencoder for estimating the thickness and mixing ratio of pigments

Abstract

1. INTRODUCTION

2. RELATED WORK

A. Spectroscopy for Layered Surface Objects

B. Physics-Based Deep Learning

3. REFLECTANCE MODEL ON LAYERED SURFACE OBJECTS

A. Kubelka-Munk Theory

B. Pigment Mixing Theory

4. ANALYSIS BY PHYSICS-BASED AUTOENCODER

A. Problem Setting

B. Physics-Based Autoencoder

5. EXPERIMENT

A. Quantitative Evaluation Using Synthetic Data

1. Dataset

2. Quantitative Evaluation

B. Estimation for Measured Data (Pigments of Decorated Tomb Murals)

1. Dataset

2. Evaluation of the Results

C. Estimation for Measured Data (Pigments of Watercolors)

1. Dataset

2. Evaluation of the Results of Sample 1

3. Evaluation of the Results of Sample 2

D. Comparison with Supervised Learning

1. Supervised Learning Method

2. Training Dataset

3. Comparison

6. CONCLUSION AND FUTURE WORK

A. Conclusion

B. Limitation and Future Work

Disclosures

Data availability

REFERENCES

Data availability

Cited By

Figures (17)

Tables (1)

Equations (14)

Journal of the Optical Society of America A