Retinal optical coherence tomography image analysis by a restricted Boltzmann machine

Mansooreh Ezhei; Gerlind Plonka; Hossein Rabbani

doi:10.1364/BOE.458753

1. Introduction

The retina is a multilayered structure that is located in the innermost layer of the eye and is composed of several layers of cells. The retina consists of light-receptor and nerve cells that changes the light energy to nerve signals for the brain to analyze. As some of the most common retinal diseases, one can point to Age-related Macular degeneration (AMD), Glaucoma and diabetic macular edema (DME). One of the most important diagnostic methods for these diseases is precise imaging of layers and veins in the retina. As examples of imaging techniques, we refer to fundus imaging and Optical Coherence Tomography (OCT). The fundus images are the images of the rear of the eye with high-quality and color images from the veins of the retina. OCT is a newer and more advanced technique which uses the high precision tomographic sections of the retina layers to provide the ophthalmologist with valuable information [1]. By analyzing the retina images, valuable information can be extracted and sometimes a variety of diseases and complications can be identified (Fig. 1).

Fig. 1. Retinal Layers and Abnormalities in OCT image [2]. a) Retinal Layers. b) Retinal Layers Names in Normal B-scan. c) AMD Abnormality in B-scan (left figure), DME Abnormality in B-scan (right figure).

Download Full Size | PDF

With the advances in OCT imaging, we have a lot of visual information about retinal abnormalities such as cystoid areas and corrupted retinal layers in AMD, Glaucoma and DME. But without the creation of an intelligent diagnostic system, analyzing these images is difficult and time consuming and we cannot provide professional and adequate information about retinal disorders due to having a large volume of information.

OCT image analysis is performed in the following three areas:

1. Image enhancement and denoising [3–6],
2. Diagnosis and detection of retinal abnormalities [7–12],
3. Segmentation of retinal layers [13–18].

In recent years, deep learning [19] has gained attention of the researchers in various fields of image enhancement [20–24] and image segmentation [25–29]. These networks have end-to-end structures, i.e. features are learned automatically as part of the network learning process. This excluded the need to hand crafted features. Recent studies have reported high performance for these models for ordinary images. However, there are significant challenges in using these models in real clinical applications. The effectiveness of this method depends on the high volume of training data. The lack of large image datasets from multiple OCT devices and non-standard imaging is one of the most critical barriers to activate artificial intelligence for OCT analysis.

In OCT image enhancement, the lack of a clean large image dataset is the main challenge for deep learning approaches. In real clinical applications, a high volume of clean images is not available. However, most deep learning-based image enhancement methods require large amounts of train data. Therefore, it is not possible to extend these methods to OCT image enhancement.

In OCT image segmentation and abnormalities detection, accurate annotation of images is fundamental for medical image analysis using supervised deep learning algorithms. Due to the sensitivity of medical images, this can only be done with the help of a specialist physician. Annotation and segmentation of retinal OCT is highly challenging due to the presence of eye movements resulting in motion artifacts and poor signal to noise ratio. The segmentation is also particularly difficult in cases of abnormal eyes with AMD and DME. The difficulty specifically arises due to the damage of retinal blood vessels and accumulation of fluid between the retinal layers. These inherent challenges make the interpretation of an OCT image cumbersome and often highly variable among experts. Consequently, manual annotation of the OCT image is very subjective and time consuming. Effectiveness of supervised deep learning methods depends on the high volume of tagged data. So, even with the great medical image segmentation achievements in recent years, the accuracy of medical image segmentation based on deep learning is not high and unable to meet the actual clinical requirements [30].

Using unsupervised learning can be a solution to these challenges and exploring to develop unsupervised methods for analyzing OCT images is essential to aid in accurate diagnosis.

In this paper, we focus on OCT image enhancement and identifying Hyper-reflective Foci (HF) in OCT B-scans using the restricted Boltzmann machine (RBM) as an unsupervised learning method. RBM [31,32] is a generative and unbiased probabilistic model, which uses a hidden layer to model a distribution on its observable variables. RBM extracts high-level features using unlabeled data and increases the power of differentiation between different categories in the data.

The remaining parts of the paper are organized as follows: Since we use the RBM to model unlabeled retinal data (normal retina, abnormal retina), Section 2 is dedicated to introducing the structure of the RBM to reconstruct an OCT image. Section 3 describes the proposed method for improving the quality of OCT images. Section 4 is dedicated to explain the application of RBM for detecting and locating a HF in B-scan data which is one of the first symptoms of the retina disease and appears as bright spots in OCT images. Section 5 concludes the paper.

2. Restricted Boltzmann machine

A Boltzmann machine is a particular type of a Markov random field. It is a symmetric network with binary random units. This network consists of a set of visible units ${\boldsymbol v} = \{ {v_1}, \ldots ,{v_{{g_v}}}\} $ and a set of hidden units ${\boldsymbol h} = \{ {h_1}, \ldots ,{h_{{g_h}}}\} $, where ${v_i} \in \{ 0,1\} $ and ${h_j} \in \{ 0,1\} $. The number of observable and hidden units are ${g_v}$ and ${g_h}$. The Boltzmann machine can be represented as a weighted undirected graph with the two sets of units ${\boldsymbol v}$ and ${\boldsymbol h}$ as vertices. Let the non-negative weights of the connections between ${v_i}$ and ${v_j}$ be denoted by ${L_{i,j}}$, between ${h_i}$ and ${h_j}$ by ${J_{i,j}}$, and between ${v_i}$ and ${h_j}$ by ${W_{i,j}}$. We consider the corresponding symmetric weight matrices ${\boldsymbol L} = ({L_{i,j}})_{i,j = 1}^{{g_v}}$, ${\boldsymbol J} = ({J_{j,i}})_{j,i = 1}^{{g_h}}$ with zero diagonals, as well as ${\boldsymbol W} = ({W_{i,j}})_{i,j = 1}^{{g_v},{g_h}}$.

Then the state energy of the Boltzmann machine (without bias) is defined as:

(1)$$E({\boldsymbol v},{\boldsymbol h}) ={-} \left( {\frac{1}{2}{{\boldsymbol v}^T}{\boldsymbol Lv} + \frac{1}{2}{{\boldsymbol h}^T}{\boldsymbol Jh} + {{\boldsymbol v}^T}{\boldsymbol Wh}} \right).\begin{array}{*{20}{c}} {}&{} \end{array}$$

To reduce the complexity of the model of the Boltzmann machine we consider the Restricted Boltzmann machine (RBM), where we have no weights between the visible units, i.e. ${\boldsymbol L} = {\boldsymbol 0}$, and also no weights between the hidden units, i.e. ${\boldsymbol J} = {\boldsymbol 0}$. Then the corresponding graph consists of two layers ${\boldsymbol v}$ and ${\boldsymbol h}$, and we have only connections between these layers with weights ${W_{i,j}}.$ Incorporating also a possible bias, the state of energy of RBM equals to:

(2)$$E({\boldsymbol v},{\boldsymbol h}) ={-} ({{{\boldsymbol v}^T}{\boldsymbol Wh} + {{\boldsymbol a}^T}{\boldsymbol v} + {{\boldsymbol b}^T}{\boldsymbol h}} )={-} \left( {\sum\limits_{i = 1}^{{g_v}} \sum\limits_{j = 1}^{{g_h}} {W_{i,j}}{v_i}{h_j} + \sum\limits_{i = 1}^{{g_v}} {a_i}{v_i} + \sum\limits_{j = 1}^{{g_h}} {b_j}{h_j}} \right),$$

where ${\boldsymbol a} = ({a_i})_{i = 1}^{{g_v}}$ and ${\boldsymbol b} = ({b_j})_{j = 1}^{{g_h}}$ are the bias vectors, see [33].

Using this energy, a probability can be assigned to any pair of visible and hidden unit vectors ${\boldsymbol v}$ and ${\boldsymbol h}$. The joint distribution is defined as:

(3)$$P({\boldsymbol v},{\boldsymbol h}) = \frac{1}{Z}\textrm{exp} ( - E({\boldsymbol v},{\boldsymbol h})),$$

where Z is the normalization constant (also called “partition function” [33]) which is obtained by the sum of all ${2^{{g_v}}}$ states of observable and all ${2^{{g_h}}}$ states of hidden vectors,

(4)$$Z = \sum\limits_{{\boldsymbol v} \in {{\{ 0,1\} }^{{g_v}}}} \sum\limits_{{\boldsymbol h} \in {{\{ 0,1\} }^{{g_h}}}} \textrm{exp} ( - E({\boldsymbol v},{\boldsymbol h})).$$

The (marginal) probability that the RBM attributes to a fixed visible vector ${\boldsymbol v}$ is then

(5)$$P({\boldsymbol v}) = \sum\limits_{{\boldsymbol h} \in {{\{ 0,1\} }^{{g_h}}}} P({\boldsymbol v},{\boldsymbol h}) = \frac{1}{Z}\sum\limits_{{\boldsymbol h} \in {{\{ 0,1\} }^{{g_h}}}} \textrm{exp} ( - E({\boldsymbol v},{\boldsymbol h})).$$

The probability $P({\boldsymbol v})$ that the network attributes to the training data can be increased by adjusting the weights and bias to achieve less energy for these data and more energy for other data that results in a change of the probability.

2.1 Inference process

The application of RBM consists of two phases: 1) Forward phase and 2) Backward phase (reconstruction).

Phase 1) Since the graph structure of RBM does not admit intra-layer connections, we can assume that hidden unit activations are independent for a given state of the visible layer vector ${\boldsymbol v}$. Vice versa, the visible unit activations are independent for a given state of the hidden vector ${\boldsymbol h}$. Thus for the given observable data ${\boldsymbol v}$, the probability that ${h_j}$ is equal to 1 with following probability:

(6)$$P({h_j} = 1|{\boldsymbol v}) = \sigma \left( {{b_j} + \sum\limits_{i = 1}^{{g_v}} {W_{i,j}}{v_i}} \right),\quad \quad j = 1, \ldots ,{g_h},$$

where $\sigma (x) = 1/(1 + \textrm{exp} ( - x))$ denotes the sigmoidal function. (see Proof A)

In the first phase we use the given visible data ${\boldsymbol v}$ as input and apply (6) to update the states of the hidden units ${h_j}$.

Phase 2) Conversely, using the independence of the visible unit activations, we obtain similarly as above the conditional probability for a state of ${v_i}$ for a given vector of hidden units ${\boldsymbol h}$,

(7)$$P({v_i} = 1|{\boldsymbol h}) = \sigma \left( {{a_i} + \sum\limits_{j = 1}^{{g_h}} {W_{i,j}}{h_j}} \right),\quad \quad i = 1, \ldots ,{g_v}.$$

In the second phase (reconstruction phase), we use the samples from the hidden layer as input to update the visible states. Observe that we have the same weight matrix in (6) and (7), while the bias vector ${\boldsymbol b}$ only appears in (6), and the bias ${\boldsymbol a}$ only appears in (7).

2.2 Training process

As seen in (6) and (7), once the RBM model (i.e. ${\boldsymbol W} = ({W_{i,j}})_{i,j = 1}^{{g_v},{g_h}}$, ${\boldsymbol a} = ({a_i})_{i = 1}^{{g_v}}$ and ${\boldsymbol b} = ({b_j})_{j = 1}^{{g_h}})$ is fixed, we can compute the conditional probability of the hidden layer ${\boldsymbol h}$ from the given visible training vectors ${\boldsymbol v}$, and can then sample ${\boldsymbol h}$ from this distribution. Conversely, we can compute the conditional probability of ${\boldsymbol v}$ from the vector ${\boldsymbol h}$ and compare the result with the training samples to check if the RBM model has been chosen correctly. Our task is now to determine ${\boldsymbol W}$, ${\boldsymbol a}$, and ${\boldsymbol b}$. For this purpose, the objective is to increase $P({\boldsymbol v})$ in (5) for the training data ${\boldsymbol V} = \{ {{\boldsymbol v}^{(1)}}, \ldots ,{{\boldsymbol v}^{(m)}}\} $, i.e. g_v= m. Therefore, we consider the log-likelihood objective function

(8)$$\mathrm{{\cal L}}({\boldsymbol W},{\boldsymbol a},{\boldsymbol b}|{\boldsymbol V}): = \sum\limits_{\ell = 1}^m \ln (P({{\boldsymbol v}^{(\ell )}}|{\boldsymbol W},{\boldsymbol a},{\boldsymbol b}))$$

that needs to be maximized. To this end, the mean of the derivatives with respect to ${W_{i,j}}$ over the training data ${\boldsymbol V} = \{ {{\boldsymbol v}^{(1)}}, \ldots ,{{\boldsymbol v}^{(m)}}\} $ is: (see Proof B)

(9)$$\frac{1}{m}\sum\limits_{\ell = 1}^m \frac{{\partial \ln \mathrm{{\cal L}}({\boldsymbol W},{\boldsymbol a},{\boldsymbol b}|{{\boldsymbol v}^{(\ell )}})}}{{\partial {W_{i,j}}}} = \frac{1}{m}\sum\limits_{\ell = 1}^m v_i^{(\ell )}P({h_j} = 1|{{\boldsymbol v}^{(\ell )}}) - \sum\limits_{\boldsymbol v} P({\boldsymbol v}){v_i}{\kern 1pt} P({h_j} = 1|{\boldsymbol v}).$$

As in (9), we see that the first term in the derivative can be computed, it depends on the training data and on ${\boldsymbol W}$ and ${\boldsymbol b}$. However, the second term depends on the (unknown) RBM model distribution $P({\boldsymbol v})$ and is intractable to compute, since we have to sum up over ${2^{{g_v}}}$ possible states. The two terms in (9) are often shortly denoted by ${\langle {v_i}{h_j}\rangle _{\textrm{d}ata}}$ and ${\langle {v_i}{h_j}\rangle _{\textrm{m}odel}}$ such that we obtain

(10)$$\frac{1}{m}\sum\limits_{\ell = 1}^m \frac{{\partial \ln \mathrm{{\cal L}}({\boldsymbol W},{\boldsymbol a},{\boldsymbol b}|{{\boldsymbol v}^{(\ell )}})}}{{\partial {W_{i,j}}}} = {\langle {v_i}{h_j}\rangle _{\textrm{d}ata}} - {\langle {v_i}{h_j}\rangle _{\textrm{m}odel}}.$$

Here, the brackets represent the calculation of the expectations of ${v_i}{h_j}$ with respect to the distributions $P({\boldsymbol h}|{\boldsymbol v})Q({\boldsymbol v})$ on the one hand and to $P({\boldsymbol h},{\boldsymbol v})$ on the other hand, where $Q({\boldsymbol v})$ denotes the empirical distribution of the data set ${\boldsymbol V}$.

We find for the derivatives with respect to the bias parameters a_i and b_j : (see Proof C)

(11)$$\begin{aligned} \frac{{\partial \ln \mathrm{{\cal L}}({\boldsymbol W},{\boldsymbol a},{\boldsymbol b}|{{\boldsymbol v}^{(\ell )}})}}{{\partial {a_i}}} &= v_i^{(\ell )} - \sum\limits_{\boldsymbol v} P({\boldsymbol v}){\kern 1pt} {v_i} \\ \frac{{\partial \ln \mathrm{{\cal L}}({\boldsymbol W},{\boldsymbol a},{\boldsymbol b}|{{\boldsymbol v}^{(\ell )}})}}{{\partial {b_j}}}& = P({h_j} = 1|{{\boldsymbol v}^{(\ell )}}) - \sum\limits_{\boldsymbol v} P({\boldsymbol v}){\kern 1pt} P({h_j} = 1|{\boldsymbol v}).\end{aligned} $$

Also here, we can take the mean over all training samples ${{\boldsymbol v}^{(\ell )}}$. To approximate the RBM log-likelihood gradient, we employ the method of contrastive divergence [34]. Since the RBM distribution is unknown to us, we start with an initial state for ${\boldsymbol W}$, ${\boldsymbol a}$ and ${\boldsymbol b}$, and will iteratively improve these parameters. We will iteratively update:

(12)$$\begin{aligned} W_{i,j}^{(k + 1)} &= W_{i,j}^{(k)} + \epsilon \Delta W_{i,j}^{(k)},\quad \quad i = 1, \ldots ,{g_v},{\kern 1pt} j = 1, \ldots ,{g_h}, \\ a_i^{(k + 1)}& = a_i^{(k)} + \epsilon \Delta a_i^{(k)},\quad \quad i = 1, \ldots ,{g_v}, \\ b_j^{(k + 1)}& = b_j^{(k)} + \epsilon \Delta b_j^{(k)},\quad \quad j = 1, \ldots ,{g_h}. \end{aligned}$$

Here $\epsilon < 1$ is called the learning rate. As proposed in [33], we initialize all components $W_{i,j}^{(0)}$ by taking random values chosen from a zero mean Gaussian with standard deviation $0.01$. Further, we initialize $a_i^{(0)}$ as $\log ({p_i}/(1 - {p_i}))$, where ${p_i}$ denotes the number of training data ${{\boldsymbol v}^{(\ell )}} \in {\boldsymbol V}$ with ${\boldsymbol v}_i^{(\ell )} = 1$. The bias ${{\boldsymbol b}^{(0)}}$ of the vector of hidden units is initialized by ${\boldsymbol 0}$.

The idea of contrastive divergence is now to estimate the second sums in the equations (9), and (11) using the current update of the RBM model which is applied to one training vector.

More exactly, for $\ell = 1, \ldots ,m$, we take the training sample ${{\boldsymbol v}^{(\ell )}}$ from ${\boldsymbol V}$, and compute $P({h_j} = 1,{{\boldsymbol v}^{(\ell )}})$ for $j = 1, \ldots ,{g_h}$ via (6) (using the latest known update of the RBM model with ${{\boldsymbol W}^{(k)}}$, ${{\boldsymbol a}^{(k)}}$, ${{\boldsymbol b}^{(k)}}$ and sample ${{\boldsymbol h}^{(\ell )}}$ from the obtained distribution. Then, we apply (7) to compute $P({v_i} = 1,{{\boldsymbol h}^{(\ell )}})$ for $i = 1, \ldots ,{g_v}$ and sample ${\tilde{{\boldsymbol v}}^{(k)}}$ from the obtained distribution. The new update values for the model parameters are now computed as

(13)$$\begin{aligned} \Delta W_{i,j}^{(k)} &= \frac{1}{m}\sum\limits_{\ell = 1}^m ({v_i^{(\ell )}P({h_j} = 1|{{\boldsymbol v}^{(\ell )}}) - \tilde{v}_i^{(\ell )}P({h_j} = 1|{{\tilde{{\boldsymbol v}}}^{(\ell )}})} ), \\ \Delta a_i^{(k)}& = \frac{1}{m}\sum\limits_{\ell = 1}^m ({v_i^{(\ell )} - \tilde{v}_i^{(\ell )}} ), \\ \Delta b_j^{(k)} &= \frac{1}{m}\sum\limits_{\ell = 1}^m ({P({h_j} = 1|{{\boldsymbol v}^{(\ell )}}) - P({h_j} = 1|{{\tilde{{\boldsymbol v}}}^{(\ell )}})} ). \end{aligned}$$

The algorithm for 1-step contrastive divergence can be summarized as follows:

Algorithm 1. 1-step Contrastive Divergence
Input: current RBM (${{\boldsymbol W}^{(k)}}$, ${{\boldsymbol a}^{(k)}}$, ${{\boldsymbol b}^{(k)}}$), training batch ${\boldsymbol V} = ({{\boldsymbol v}^{(1)}}, \ldots ,{{\boldsymbol v}^{(m)}})$.
Output: gradient approximations $\Delta W_{i,j}^{(k)}$, $\Delta a_i^{(k)}$ $\Delta b_j^{(k)}$, $i = 1, \ldots {g_v}$, $j = 1, \ldots {g_h}$.
Initialize $\Delta W_{i,j}^{(k)} = 0$, $\Delta a_i^{(k)} = 0$ $\Delta b_j^{(k)} = 0$, $i = 1, \ldots {g_v}$, $j = 1, \ldots {g_h}$
for $\ell = 1, \ldots ,m$
for $j = 1, \ldots ,{g_h}$ do sample $h_j^{(\ell )}\mathrm{\ \sim }P({h_j} = 1\|{{\boldsymbol v}^{(\ell )}})$ via (7) using the current RBM; end(for)
for $i = 1, \ldots ,{g_v}$ do sample $\tilde{v}_i^\ell \mathrm{\ \sim }P({v_i} = 1\|{{\boldsymbol h}^{(\ell )}})$ via (8) using the current RBM; end(for)
for $i = 1, \ldots ,{g_v}$, $j = 1, \ldots ,{g_h}$ do
$\Delta w_{i,j}^{(k)} = \Delta w_{i,j}^{(k)} + \frac{1}{m}({v_i^{(\ell )}P({h_j} = 1\|{{\boldsymbol v}^{(\ell )}}) - \tilde{v}_i^{(\ell )}P({h_j} = 1\|{{\tilde{{\boldsymbol v}}}^{(\ell )}})} );$
end(for)
for $i = 1, \ldots ,{g_v}$ do $\Delta a_i^{(k)} = \Delta a_i^{(k)} + \frac{1}{m}({v_i^{(\ell )} - \tilde{v}_i^{(\ell )}} )$ end(for)
for $j = 1, \ldots ,{g_h}$ do $\Delta b_j^{(k)} = \Delta b_j^{(k)} + \frac{1}{m}({P({h_j} = 1\|{{\boldsymbol v}^{(\ell )}}) - P({h_j} = 1\|{{\tilde{{\boldsymbol v}}}^{(\ell )}})} )$ end(for)
end(for)

To train the parameters of the RBM, we may employ several training batches and several epochs. The updating procedure in Algorithm 1 is iterated until the update values have a modulus being smaller than a predetermined bound (or until an upper bound of iterations is reached). A detailed explanation about the formulation of RBM can be found in Supplement 1.

3. Contrast enhancement of optical coherence tomography B-scans

The purpose of image enhancement methods is to make changes in one or more features of the image so that the resulting image is superior in particular criteria. In this section, we focus on B-scan contrast enhancement using RBM as an unsupervised learning method. When train a RBM to approximate a degraded image, the meaningful patterns of image are learned with the priority over random patterns such as noise. Thus training RBM maps an input to the most important features of image. So, RBM can be utilized to enhance the noisy images.

For our applications of contrast enhancement, we need to generalize the visible data vector ${\boldsymbol v}$ to a vector of real-valued data scaled to the input data $[0,1]$. For a grayscale image the range $[0,1]$ is then replaced by $\{ 0,\frac{1}{{255}}, \ldots ,\frac{{254}}{{255}},1\} $ and is still finite. The computations above remain the same in training proceess while in reconstruction we employ the Rectified Linear Unit (ReLU) function instead of the sigmoid function, where ReLU$(x): = \max \{ 0,{\kern 1pt} x\} $.

Dataset: In this section, three datasets have been used for the evaluation of the proposed algorithms. The first dataset is includes the data of the Topcon 3D OCT-1000 from Feyz Ophthalmology Hospital in Isfahan [18]. This OCT dataset is available at: https://misp.mui.ac.ir/fa/node/1368. The data related to the Topcon device were obtained from 13 healthy individuals and included 128 two-dimensional B-scan for each individual. Each two-dimensional data has dimensions of 650 ${\times}$ 512 ${\times}$ 128, with a size of 7 ${\times}$ 3.125 mm². The size of each pixel is equivalent to 4.81 ${\times}$ 13.67 µm². Five individuals are randomly selected. For each selected individual, 10 normal two-dimensional data were randomly selected and tested. The second OCT dataset includes the data of the Ziess imaging systems [35]. This OCT dataset is available at: https://osf.io/5wyr3/. The data were obtained in a study approved by Ethics Committee of the Sestre-milosrdnice University Hospital Center and the Faculty of Electrical Engineering and Computing (Zagreb, Croatia). The data related to the Ziess device were captured from 24 different subjects and consist of 128 B-scans per individual. The resolution of each B-scan is 1024×512 pixels (with the pixel size of 1.96 µm×11.74 µm). The third OCT dataset is from Duke university and include the data of the Bioptigen imaging systems [4]. This OCT dataset is available at: http://people.duke.edu/∼sf59/Fang_BOE_2012.htm. The data related to the Bioptigen device were obtained from 17 different subjects and included the averaged foveal image that considered as a clean foveal B-scan from the noisy volumetric scan.

Normalization: Using the mean-variance normalization method, the pixel values of each image are normalized. Using this method, the pixel values between 0 and 255 are converted to decimal values with unit variance and the visibility of some layers are improved:

X = \frac{{X - mean(X)}}{{std(X)}} + \beta

where $\beta $ is a parameter between 0 and 1. As the $\beta $ is increased the visibility of layers is increased after reconstruction.

Network: Each normalized image is independently learned by the RBM network with a structure of 512 ${\times}$ 512 by 120 with learning rate of 0.5 and is eventually reconstructed. In this case we have g_v= 512 ${\times}$ 512 and g_h= 120. Reconstruction of images given by the RBM network leads to improved image contrast.

Figure 2 and Fig. 3 show examples of images together with the reconstructed images by contrast enhancement method in this paper. According to Fig. 3, by increasing the $\beta $ from 0 to 1, the visibility of the layers is increased.

Fig. 2. Contrast enhancement results of OCT image samples from Topcon device ($\beta = 0$). From left to right: original image (and a sample zoomed area), proposed method (and a sample zoomed area)

Download Full Size | PDF

Fig. 3. Contrast enhancement results of OCT image samples from Topcon device. From left to right: original image, the results of proposed method for $\beta = 0$, and for $\beta = 1$.

Download Full Size | PDF

The pseudocode of the proposed method for contrast enhancement is concluded in Algorithm 2 (next page).

Algorithm 2. OCT image Enhancement
1) Normalize the image as follow:
$\displaystyle X = \frac{{X - mean(X)}}{{std(X)}} + \beta$
2) Fit the RBM to the normalized image using 1-step Contrastive Divergence.
3) Start with a normalized image on the visible units of trained RBM.
4) Compute hidden units’ probability in parallel as follows.
$\displaystyle P({h_j} = 1\|{\boldsymbol v}) = sigmoid({b_j} + \sum\limits_i {{v_i}{w_{ij}}})$
5) Compute all the visible units in parallel as follows to get an enhanced image reconstruction
$\displaystyle {v_i} = Re lu({a_i} + \sum\limits_j {{h_j}{w_{ij}}} )$

3.1. Evaluation

In this section, the performance of the method is compared with other previous superior approaches in the same field. Such evaluation can be done visually and based on quantitatively defined criteria.

3.1.1 Evaluation on real OCT data

For better evaluation, the method proposed by Amini et al [36] was selected as a strong method in improving the contrast of OCT images and its performance was compared with our proposed method. Reference [36] discussed that, due to the layered structure of the retina, a mixture distribution is an appropriate statistical model for an OCT image. Distribution of each component of this model is Normal-Laplace and they called this model a normal- Laplace mixture (NLM) model. Moreover, they propose a new contrast enhancement method based on NLM model that we call CE-NLM method. There are few methods of OCT contrast enhancement. In [36], the results of CE-NML method is also compared to other image enhancement methods like histogram shaping and CLAHE techniques and the results show that CE-NML outperforms these mentioned methods. Our proposed method and the CE-NLM method were tested on the Topcon dataset. Their visual results can be seen in the two samples data in Fig. 4.

Fig. 4. Contrast enhancement results of an OCT image sample from Topcon device. In each row, from left to right: the original image, CE-NLM method [36], proposed method

Download Full Size | PDF

In order to quantitatively evaluate the degree of contrast improvement in each of the above two methods, two criteria of the contrast to noise ratio (CNR) and edge protection (EP) were used.

CNR is a measure of the contrast between a region of interest (ROI) and the background area. For the m-th ROI, CNR is defined as follows [37].:

(14)$$CNR = 10\log \frac{{{\mu _m} - {\mu _b}}}{{\sqrt {\sigma _m^2 + \sigma _b^2} }},$$

where µ_m and σ_m are mean and variance of the m-th ROI, and µ_b and σ_b are mean value and standard derivation of the background area.

The EP calculates the correlation of the edges in the image before and after the noise removal in the ROI. The EP measure for the m-th ROI is calculated as follows [37]:

(15)$$E{P_m} = \frac{{\Gamma (\Delta I_m^{\prime} - \overline {\Delta I_m^{\prime}} ,\Delta {I_m} - \overline {\Delta {I_m})} }}{{\sqrt {\Gamma (\Delta I_m^{\prime} - \overline {\Delta I_m^{\prime}} ,\Delta {I_m} - \overline {\Delta {I_m}^{\prime}} \Gamma (\Delta I_m^{} - \overline {\Delta I_m^{}} ,\Delta {I_m} - \overline {\Delta {I_m}} } )}},$$

where I_m are sub-matrices that contain the processed (denoised) image regions, in the m-th ROI and I’ _m are sub-matrices that contain the original noisy image regions in the m-th ROI. The operator Δ denotes a Laplacian operator. In practice, we obtain ΔI by convolving I_m with a standard 3×3 approximation of the Laplacian operator.$\bar{I}\_m$ is the mean of I_m. Γ represents correlation operator inside the ROI:

\Gamma ({I_1},{I_2}) = \sum\limits_{(i,j) \in ROI} {{I_1}(i,j){I_2}} (i,j).

(16)$$\Gamma ({I_1},{I_2}) = \sum\limits_{(i,j) \in ROI} {{I_1}(i,j){I_2}} (i,j).$$

EP is calculated by averaging the edge preservation measure over the M selected ROI’s:

(17)$$EP = \frac{1}{M}\sum\limits_{m = 1}^M {E{P_m}} .$$

The EP value can be measured in different ROIs, the average of which is between zero and one. Small values of this criterion are obtained for blurred edges in ROI. Areas selected for EP calculation should contain useful edge information. Figure 5 shows an example of how to select the appropriate ROIs for each of the above criteria.

These ROIs include the details and features considered in each metric. For example, areas selected for CNR are selected between different layers to show contrast changes, and EP benchmark ROIs are plotted to include edge information. Another point is that these areas are drawn in exactly the same place for the original image and the enhanced image with each of the methods, so that it is possible to compare the improvements.

Fig. 5. Selected ROIs to calculate left) EP, right) CNR. The larger circle outside the retina is used as the background ROI, and the other circles are the foreground ROIs.

Download Full Size | PDF

In addition, Aghaian [38] introduced Evaluation Measure of Enhancement (EME) as an evaluation measure of contrast enhancement. This evaluation measure is based on entropy and is defined as follows:

(18)$$EM{E_{_{\alpha ,{k_1}{k_2}}}} = \frac{1}{{{k_1}{k_2}}}{\sum\limits_{i = 1}^{{k_1}} {\sum\limits_{j = 1}^{{k_2}} {\alpha \times \left( {\frac{{I_{{{\max }_{i,j}}}^w}}{{I_{{{\min }_{i,j}}}^w + c}}} \right)} } ^\alpha }\ln \frac{{I_{{{\max }_{i,j}}}^w}}{{I_{{{\min }_{i,j}}}^w + c}}$$

where the image I is divided into ${k_1} \times {k_2}$ blocks, $\alpha $ is an enhancement parameter between 0 and 0.99 and $I_{{{\max }_{i,j}}}^w$ shows the maximum, and $I_{{{\min }_{i,j}}}^w$ shows the minimum intensities in a block. c is a small constant to prevent division by 0. In fact, EME measures the entropy or information in image contrast and is directly related to contrast. As the contrast of an image increases, the EME is also increased.

The results of CNR, EP and EME averaged over 10 B-scans of 5 subjects in the Topcon dataset have been calculated. The average results on all slices from the Topcon device are shown in Table 1. We also tested our algorithm on the B-scans from Zeiss device and the results averaged over 50 Zeiss B-scans have been displayed in Table 1.

Table 1. Evaluation Measure Results; Averaged over 50 Topcon device OCT B-scans, Averaged over 50 Zeiss device OCT B-scans

View Table | View all tables in this article

According to Table 1, it is clear that the performance of the proposed method in terms of CNR and EME criteria and contrast improvement tested on the Topcon data set and Zeiss dataset was significantly better than the method [36] with significant difference (p < 0.05). Regarding the EP criterion, the performance of both methods was close to each other. Therefore, in general, the proposed method for contrast enhancement of OCT images has acceptable performance both qualitatively and quantitatively.

3.1.2 Evaluation on benchmark data

In this section we use the OCT dataset from Duke university that includes the data of the Bioptigen imaging system [4]. The data were obtained from 17 different subjects. Our proposed method is implemented to the low signal-to-noise ratio (SNR) images in this dataset and then uniform filter is applied in which each pixel’s value is replaced by the mean value of 9*9 area centered at the pixel’s location. Our proposed method is compared to the following state of the art methods in [4], [39], [40] and [6]:

In [4], for each of high-SNR images, a sparse representation dictionary is learned. The dictionaries are utilized to denoise the low-SNR B-scans. They called this method multiscale sparsity based tomographic denoising (MSBTD). In real clinical application, an ideal noiseless image is not available. However, in [4] the averaged foveal image is considered as a noise-free image of the noisy foveal B-scan. In [6], a multivariate statistical model for describing OCT images is proposed. The proposed model is sparse version of spatially constrained Gaussian mixture model (SC-GMM) method [41] and is named sparse SC-GMM. In [40], Variational Auto-Encoder (VAE) is introduced. This model is a particular type of probabilistic graphical models and variational Bayesian methods. VAE has two process of encoding and decoding. In the encoding process, the input information is compressed into a constrained multivariate latent distribution. In the decoding process, the input information is reconstructed as accurately as possible. Our proposed method and the abovementioned methods were tested on the Bioptegen dataset. Their visual results can be seen for a sample data in Fig. 6. The comparison of CNR, EP and EME results are concluded in Table 2.

Fig. 6. The visual comparison of different methods: a) noisy image, b) proposed method, c) Sparse SC-GMM [6], d) VAE [40] and e) MSTD [4]

Download Full Size | PDF

Table 2. Evaluation Measure Results for Bioptegen OCT retinal images

View Table | View all tables in this article

The comparison of CNRs, EPs, and EMEs in Table 2 shows that the proposed method significantly improves the image contrast and outperforms the state of the art methods (p < 0.005). The VAE and Sparse SC-GMM methods can less maintain the tissues and MSTDB needs clean image for learning. The evaluation results show that the proposed method improves the contrast, preserves the edges and maintains the tissues without any need to clean image for learning. Moreover, in the proposed method the visibility of some layers could be decreased or increased after reconstruction as the β is increased or decreased. Using this property, we employ the proposed method to detect Hyperreflective Foci in Section 4.

4. Hyperreflective foci detection in optical coherence tomography B-scans

Hyperreflective Foci (HF) is one of the first symptoms of the disease that appears as bright spots in OCT images. HF is distributed in the retina layers of patients with Diabetic Macular Edema (DME) and can be traced in images containing retinal layer information such as OCT B-scan. In addition, the location of the HF in the retinal layers can help predict the progression of the disease [42]. Figure 7. shows the HF samples in patient with DME. The problem of HF segmentation would be more challenging in low contrast images.

Fig. 7. Sample HFs in a DME patient

Download Full Size | PDF

In this section, we focus on identifying HF in B-scan using a RBM as an unsupervised learning method. In order to improve the HF segmentation, we enhance the image contrast by proposed image enhancement method. In proposed method, the image is reconstructed by RBM through defining a target function and minimizing it. To that end, first we describe the role of the RBM in image contrast enhancement. Then, we use GMM algorithm on the reconstructed images to cluster HF points. By removing the candidate points outside the central retinal layer (between the RNFL and RPE layers), the remaining regions are the final candidate HF. We evaluate the performance of our method against other techniques and finally, the effect of proposed image contrast enhancement based on RBM for HF segmentation is discussed.

Dataset: The data set of OCT imaging system of the Heidelberg system from Duke University is used for HF detection. These data were obtained from ten patients with DME with lateral resolutions from 10.94 to 11.98 µm/pixel and azimuthal resolutions from 118 to 128 µm/pixel [16]. From this set of unhealthy data, 60 OCT image slides (6 slides per person) were randomly selected for HF detection. The OCT dataset used in this study for HF segmentation is available at: http://people.duke.edu/∼sf59/Chiu_BOE_2014_dataset.htm.

Normalization: Using the mean-variance normalization method, the pixel values of each image are normalized. Using this method, the pixel values between 0 and 255, are converted to decimal values with zero mean and unit variance.

Network: Each image is learned independently by the RBM network with a 128 ${\times}$ 128 ${\times}$ 120 structure and is eventually reconstructed. The image contrast is improved through this reconstruction.

Clustering: When we normalize the image using mean-var normalization and reconstruct the normalized image using RBM, the HF, RNFL and RPE are more enhanced than other regions. Using Gaussian Mixture Model (GMM) algorithm to cluster the region, HF and the RNFL and RPE are in one cluster and other region are in another cluster.

Removing RNFL and RPE: The reconstructed image includes primary HF candidate points. In order to reduce the false positive rates of candidate areas, the candidate points that lie outside the middle layer of the retina between the RNFL and RPE layers are removed. For two reasons some mistaken candidate points appear. First, the structure of the RNFL layer (the first layer of the retina) has distinct points which resemble the texture of the input image. Second, the presence of shadows in the RPE layer (the last retinal layer) represents it as HF. As we cluster the reconstructed image, these incorrect candidate points are also selected as HF. Therefore, to reduce false recognition rates, we limit our choice between the RNFL layer and the top layer of RPE and remove these points from the final output. For the segmentation of OCT B-scan’s layers, we used the algorithm described in detail in [43] namely Average Intensity Variation-Based Segmentation (AIVBS). AIVBS algorithm consists of three main phases which are pre- processing, boundary pixel determination and interpolation. After using AIVBS algorithm to detect the boundaries of two surfaces of RNFL and RPE, we and only keep the detected points between RNFL and RPE to obtain the final HFs. Figure 8. shows the result of extracting HF for a sample B-scan.

Fig. 8. HF Extraction Stages. From left to right: Original image, Reconstructed image using RBM, Detecting and Removing RPE and RNFL, Extracted HF

Download Full Size | PDF

The pseudocode of the proposed method for contrast enhancement is as follows:

Algorithm 3. HF Extraction
1) Normalize the image using mean-var.
2) Fit the RBM to the normalized image using 1-step Contrastive Divergence.
3) Start with a normalized image on the visible units of trained RBM.
4) Compute hidden units’ probability in parallel as follows.
$\displaystyle P({h_j} = 1\|{\boldsymbol v}) = sigmoid({b_j} + \sum\limits_i {{v_i}{w_{ij}}} )$
5) Compute all the visible units in parallel as follows to get a reconstruction
$\displaystyle {v_i} = Re lu({a_i} + \sum\limits_j {{h_j}{w_{ij}}} )$
6) Apply GMM and cluster the regions. HF, RNFL and RPE are classified in one cluster and other regions are classified in another cluster.
7) RNFL and RPE boundaries are detected by the AIVBS OCT segmentation method [43].
8) Deletion of candidate points that lie outside the middle layer of the retina between the RNFL and RPE layers.
9) The remaining candidates are extracted HFs.

4.1 Results

Identifying the HF occurs in two stages. The first step is extracting the retinal segments from the retinal layers and the second step is removing the points in RNFL and RPE layers from the result. Each image is learned independently by the RBM network with a 128 ${\times}$ 128 ${\times}$ 120 structure and is eventually reconstructed. Using GMM algorithm to cluster the region, HF and the RNFL and RPE are in one cluster and other region are in another cluster. False positive rates of the candidate areas are reduced with deletion of candidate points that lie outside the middle layer of the retina between the RNFL and RPE layers. Figure 9. shows the result of extracting HF for B-scan samples. If healthy data was used, the outcome of proposed method is black image with no HF detection.

Fig. 9. HF extraction for B-scan samples. First row: original images, Second row: extracted HFs.

Download Full Size | PDF

The experiments were run in a workstation with Intel Xeon CPU, one 11 GB Nvidia GTX 1080Ti GPU and 32 GB RAM. Each volume processing time is about 3 ms.

To estimate the efficiency of the proposed method, we measure the precision, recall and Dice coefficients as follows:

(19)$$Precision = \frac{{TP}}{{TP + FP}}$$

(20)$$Recall = \frac{{TP}}{{TP + FN}}$$

(21)$$DSC = \frac{{2TP}}{{2TP + FP + FN}}$$

that TP, TN, FP and FN represent true positive, true negative, false positive, and false negative, respectively.

The HFs in our database are labeled by an expert and the algorithm results are compared with the labeled values. Each pixel is labelled as positive/negative and the precision and recall are calculated for the segmentation experiments. The precision measure determines how many pixels (or voxels) of HF is correctly segmented by proposed method. The recall measure determines how many pixels (or voxels) out of HF is correctly excluded by proposed method. The Dice coefficient is a useful measure for the segmentation experiments which determines the similarity between tow images. The proposed algorithm segments the HF with precision 80%, recall 84% and Dice coefficient 82.09%.

Figure 10 and Fig. 11 show the cases that the algorithm fail to detect HF correctly. The red points are those under-segmented or over-segmented by the proposed method. Under segmentation could occur due to low intensities, while over-segmentation is usually occurred due to presence of vessels.

Fig. 10. Under-segmentation results. From left to right: Sample B-scan, Ground Truth, Prediction of the proposed method, Under segmented points.

Download Full Size | PDF

Fig. 11. Over segmentation results. From left to right: Sample B-scan, Ground Truth, Prediction of the proposed method, Over segmented points.

Download Full Size | PDF

In Table 4, the performance of the method is compared with other previous superior approaches in the same field. To the best of our knowledge, there is a little previous works on automatic HF segmentation. Using Residual U-Net, Schlegle et al [44] segment HFs in OCT images. Varga et.al [45] use several network such as FCN for HF segmentation. Xie et al [46] modify the 3D-Unet and extract HFs. We apply different transformations including shearing, zooming and horizontal flipping in order to augment the data of Duke dataset. We fine tune pre trained model with our dataset. We randomly split the dataset into train and test dataset (30% for testing and remaining for training). All these approaches are supervised deep learning methods. However, we focus on identifying HFs in B-scans using RBM as an unsupervised deep learning method. As illustrated in Table 3, the proposed method outperforms all previous works.

Table 3. Evaluation Measure Results; Precision, Recall and DSC

View Table | View all tables in this article

In order to analyze the impact of the proposed contrast enhancement method, the CE-NLM method was also used to enhance the contrast and the HF extraction results were compared to the segmentation results without using contrast enhancement. The results are shown in Fig. 12. Our experiments show that the segmentation algorithm did not work properly without proposed contrast enhancement method. This is because, when we normalize the image using mean-var normalization and reconstruct the normalized image using RBM, the HF, RNFL and RPE are more enhanced than other regions. Using GMM algorithm to cluster the regions, HF, RNFL and RPE are in one cluster and other regions would be in another cluster. After removing RNFL and RPE, the remaining points represent the HFs. However, in case of using CE-NML for contrast enhancement or without using any contrast enhancement method, the texture points of middle layers and HFs are categorized in one cluster and so the segmentation method cannot detect HFs properly.

Fig. 12. HF extraction for B-scan samples. First row: original images. Second row: extracted HFs using proposed contrast enhancement method. Third row: extracted HFs without contrast enhancement. Forth row: extracted HFs using CE-NLM method for contrast enhancement.

Download Full Size | PDF

5. Conclusions

Deep learning methods have achieved excellent performance in image analysis however it is hard to extend them to the OCT image analysis. Most of deep learning-based image analysis models are supervised learning-based approaches and need high volume of training data that is hard to acquire in OCT image analysis. Since acquiring reference clean images for OCT image enhancement and accurate annotation of high volume of OCT images for segmentation are hard, we proposed an unsupervised learning-based approaches for OCT image enhancement and HF detection as an abnormality segmentation. The important features of this method are illustrated and numerical efficiency of the method is analyzed.

To improve the contrast, qualitative and quantitative criteria such as CNR, EP and EME in the proposed method were compared with previous superior works. For OCT image enhancement first, each image after normalization is independently learned by the RBM network and is eventually reconstructed. In reconstruction phase we use the ReLu function instead of the Sigmoid function. Reconstruction of images given by the RBM network leads to improved image contrast. The results show that the proposed method outperforms the competitive OCT contrast enhancement methods.

For HF segmentation in patients with DME, the image is reconstructed by RBM through defining a target function in (8) and minimizing it. The GMM algorithm on the reconstructed images was used to determine the HF points. The first and the last layers were removed to achieve the final HF segmentation in patients with DME. By eliminating the candidate spots outside the central retinal layer (between the RNFL and RPE layers), the remaining regions are the final candidates of HF. Finally, the important features of this method are proven and numerical efficiency of the method is analyzed.

Funding

Iran National Science Foundation (97010062); Isfahan University of Medical Sciences (197143).

Acknowledgments

Part of this work was supported by Alexander von Humboldt Foundation under Georg Forster fellowship for experienced researchers.

Disclosures

The authors declare that there are no conflicts of interest related to this article.

Data availability

Data underlying the results presented in this paper are available in Ref. [4], [16], [18] and [35].

Supplemental document

See Supplement 1 for supporting content.

References

1. J. G. Fujimoto, W. Drexler, J. S. Schuman, and C. K. Hitzenberger, “Optical Coherence Tomography (OCT) in ophthalmology: introduction,” Opt. Express 17(5), 3978–3979 (2009). [CrossRef]

2. H. Rabbani, R. Kafieh, and Z. Amini, “Optical coherence tomography image analysis,” Wiley encyclopedia of electrical and electronics engineering, 1–16 (1999).

3. M. N. Do and M. Vetterli, “The contourlet transform: an efficient directional multiresolution image representation,” IEEE Trans. on Image Process. 14(12), 2091–2106 (2005). [CrossRef]

4. L. Fang, S. Li, Q. Nie, J. A. Izatt, C. A. Toth, and S. Farsiu, “Sparsity based denoising of spectral domain optical coherence tomography images,” Biomed. Opt. Express 3(5), 927–942 (2012). [CrossRef]

5. S. V. M. Sagheer and S. N. George, “A review on medical image denoising algorithms,” Biomed. Signal Process. Control 61, 102036 (2020). [CrossRef]

6. Z. Amini, H. Rabbani, and I. Selesnick, “Sparse domain Gaussianization for multi-variate statistical modeling of retinal OCT images,” IEEE Trans. on Image Process. 29, 6873–6884 (2020). [CrossRef]

7. A. Thomas, A. Sunija, R. Manoj, R. Ramachandran, S. Ramachandran, P. G. Varun, and P. Palanisamy, “RPE layer detection and baseline estimation using statistical methods and randomization for classification of AMD from retinal OCT,” Comput. Methods Programs Biomed. 200, 105822 (2021). [CrossRef]

8. S. Stolte and R. Fang, “A survey on medical image analysis in diabetic retinopathy,” Med. Image Anal. 64, 101742 (2020). [CrossRef]

9. N. Krishna and K. Nagamani, “Glaucoma diagnosis using state of art image analysis techniques: a comprehensive survey,” in 2020 International Conference on Inventive Computation Technologies (ICICT), (IEEE, 2020), 247–251.

10. A. R. Chowdhury, S. Banerjee, and T. Chatterjee, “A cybernetic systems approach to abnormality detection in retina images using case based reasoning,” SN Appl. Sci. 2(1), 1–15 (2020). [CrossRef]

11. C. Bhardwaj, S. Jain, and M. Sood, “Diabetic retinopathy lesion discriminative diagnostic system for retinal fundus images,” ABE 9(0), 71–82 (2020). [CrossRef]

12. R. Saha, A. R. Chowdhury, S. Banerjee, and T. Chatterjee, “Detection of retinal abnormalities using machine learning methodologies,” NNW 28(5), 457–471 (2018). [CrossRef]

13. M. Monemian and H. Rabbani, “Mathematical analysis of texture indicators for the segmentation of optical coherence tomography images,” Optik 219, 165227 (2020). [CrossRef]

14. J. Kugelman, D. Alonso-Caneiro, S. A. Read, S. J. Vincent, and M. J. Collins, “Automatic segmentation of OCT retinal boundaries using recurrent neural networks and graph search,” Biomed. Opt. Express 9(11), 5759–5777 (2018). [CrossRef]

15. L. Fang, S. Li, D. Cunefare, and S. Farsiu, “Segmentation based sparse reconstruction of optical coherence tomography images,” IEEE Trans. Med. Imaging 36(2), 407–421 (2017). [CrossRef]

16. S. J. Chiu, M. J. Allingham, P. S. Mettu, S. W. Cousins, J. A. Izatt, and S. Farsiu, “Kernel regression based segmentation of optical coherence tomography images with diabetic macular edema,” Biomed. Opt. Express 6(4), 1172–1194 (2015). [CrossRef]

17. R. Kafieh, H. Rabbani, and S. Kermani, “A review of algorithms for segmentation of optical coherence tomography from retina,” J Med Signals Sens 3(1), 45 (2013). [CrossRef]

18. R. Kafieh, H. Rabbani, M. D. Abramoff, and M. Sonka, “Intra-retinal layer segmentation of 3D optical coherence tomography using coarse grained diffusion map,” Med. Image Anal. 17(8), 907–928 (2013). [CrossRef]

19. Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature 521(7553), 436–444 (2015). [CrossRef]

20. V. Jain and S. Seung, “Natural image denoising with convolutional networks,” Advances in neural information processing systems 21, 769–776 (2008).

21. K. Zhang, W. Zuo, Y. Chen, D. Meng, and L. Zhang, “Beyond a gaussian denoiser: residual learning of deep cnn for image denoising,” IEEE Trans. on Image Process. 26(7), 3142–3155 (2017). [CrossRef]

22. K. Zhang, W. Zuo, and L. Zhang, “FFDNet: Toward a fast and flexible solution for CNN-based image denoising,” IEEE Trans. on Image Process. 27(9), 4608–4622 (2018). [CrossRef]

23. S. Guo, Z. Yan, K. Zhang, W. Zuo, and L. Zhang, “Toward convolutional blind denoising of real photographs,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019), 1712–1722.

24. A. Abbasi, A. Monadjemi, L. Fang, H. Rabbani, and Y. Zhang, “Three-dimensional optical coherence tomography image denoising through multi-input fully-convolutional networks,” Comput. Biol. Med. 108, 1–8 (2019). [CrossRef]

25. S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN: towards real-time object detection with region proposal networks,” IEEE Trans. Pattern Anal. Mach. Intell. 39(6), 1137–1149 (2017). [CrossRef]

26. L.-C. Chen, Y. Zhu, G. Papandreou, F. Schroff, and H. Adam, “Encoder-decoder with atrous separable convolution for semantic image segmentation,” in Proceedings of the European conference on computer vision (ECCV), 2018), 801–818.

27. V. Badrinarayanan, A. Kendall, and R. Cipolla, “Segnet: A deep convolutional encoder-decoder architecture for image segmentation,” IEEE Trans. Pattern Anal. Mach. Intell. 39(12), 2481–2495 (2017). [CrossRef]

28. J. Long, E. Shelhamer, and T. Darrell, “Fully convolutional networks for semantic segmentation,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2015), 3431–3440.

29. L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, and A. L. Yuille, “Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs,” IEEE Trans. Pattern Anal. Mach. Intell. 40(4), 834–848 (2018). [CrossRef]

30. X. Liu, L. Song, S. Liu, and Y. Zhang, “A Review of Deep-Learning-Based Medical Image Segmentation Methods,” Sustainability 13(3), 1224 (2021). [CrossRef]

31. G. E. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation 18(7), 1527–1554 (2006). [CrossRef]

32. G. E. Hinton and R. R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science 313(5786), 504–507 (2006). [CrossRef]

33. G. E. Hinton, “A practical guide to training restricted Boltzmann machines,” in Neural Networks: Tricks of the Trade (Springer, 2012), pp. 599–619.

34. M. A. Carreira-Perpinan and G. E. Hinton, “On contrastive divergence learning,” in Aistats, (Citeseer, 2005), 33–40.

35. M. Melinščak, M. Radmilović, Z. Vatavuk, and S. Lončarić, “Annotated retinal optical coherence tomography images (AROI) database for joint retinal layer and fluid segmentation,” Automatika: časopis za automatiku, mjerenje, elektroniku, računarstvo i komunikacije 62, 375–385 (2021).

36. Z. Amini and H. Rabbani, “Statistical modeling of retinal optical coherence tomography,” IEEE Trans. Med. Imaging 35(6), 1544–1554 (2016). [CrossRef]

37. A. Pizurica, L. Jovanov, B. Huysmans, V. Zlokolica, P. De Keyser, F. Dhaenens, and W. Philips, “Multiresolution denoising for optical coherence tomography: a review and evaluation,” CMIR 4(4), 270–284 (2008). [CrossRef]

38. S. S. Agaian, B. Silver, and K. A. Panetta, “Transform coefficient histogram-based image enhancement algorithms using contrast entropy,” IEEE Trans. on Image Process. 16(3), 741–758 (2007). [CrossRef]

39. Z. Amini and H. Rabbani, “Optical coherence tomography image denoising using Gaussianization transform,” J. Biomed. Opt. 22(08), 1 (2017). [CrossRef]

40. D. P. Kingma and M. Welling, “An introduction to variational autoencoders,” arXiv preprint arXiv:1906.02691 (2019).

41. M. Niknejad, H. Rabbani, and M. Babaie-Zadeh, “Image restoration using Gaussian mixture models with spatially constrained patch clustering,” IEEE Trans. on Image Process. 24(11), 3624–3636 (2015). [CrossRef]

42. M. Bolz, U. Schmidt-Erfurth, G. Deak, G. Mylonas, K. Kriechbaum, C. Scholda, and D. R. R. G. Vienna, “Optical coherence tomographic hyperreflective foci: a morphologic sign of lipid extravasation in diabetic macular edema,” Ophthalmology 116(5), 914–920 (2009). [CrossRef]

43. M. Monemian and H. Rabbani, “Analysis of a novel segmentation algorithm for optical coherence tomography images based on pixels intensity correlations,” IEEE Trans. Instrum. Meas. 70, 1–12 (2021). [CrossRef]

44. T. Schlegl, H. Bogunovic, S. Klimscha, P. Seeböck, A. Sadeghipour, B. Gerendas, S. M. Waldstein, G. Langs, and U. Schmidt-Erfurth, “Fully automated segmentation of hyperreflective foci in optical coherence tomography images,” arXiv preprint arXiv:1805.03278 (2018).

45. L. Varga, A. Kovács, T. Grósz, G. Thury, F. Hadarits, R. Dégi, and J. Dombi, “Automatic segmentation of hyperreflective foci in OCT images,” Comput. Methods Programs Biomed. 178, 91–103 (2019). [CrossRef]

46. S. Xie, I. P. Okuwobi, M. Li, Y. Zhang, S. Yuan, and Q. Chen, “Fast and automated hyperreflective foci segmentation based on image enhancement and improved 3D U-Net in SD-OCT volumes with diabetic retinopathy,” Trans. Vis. Sci. Tech. 9(2), 21 (2020). [CrossRef]

	Topcon dataset			Zeiss dataset
	CNR (mean ± std)	EP (mean ± std)	EME (mean ± std)	CNR (mean ± std)	EP (mean ± std)	EME (mean ± std)
Original	4.71 ± 0.59	1 ±0	0.12 ± 0.0016	3.76 ± 1.21	1 ±0	0.18 ± 0.01
CE-NML [36]	13.08 ± 2.7	0.92 ± 0.009	0.38 ± 0.008	7.76 ± 2.73	0.97 ±0.006	0.43 ± 0.01
Propsed Method	18.84 ± 4.37	0.92 ± 0.0.01	1.68 ± 0.078	10.47 ± 3.7	0.97 ±0.03	1.81 ± 0.10

	Original	MSBTD [4]	VAE [40]	Sparse SC-GMM [6]	Proposed Method
CNR (mean ± std)	1.42 ± 0.92	14.31 ± 3.72	18.46 ± 2.89	22.45 ± 2.54	28.39 ± 4.32
EP (mean ± std)	1 ± 0	0.84 ± 0.03	0.79 ±0.22	0.83 + 0.001	0.88 ± 0.001
EME (mean ± std)	0.30 ± 0.1	0.62 ± 0.051	0.78 ± 0.017	0.85 ± 0.017	0.92 ± 0.016

	ResUnet [43]	FCN [44]	3D-U-Net [45]	Proposed Method
Precision	67.4	63.5	73.4	80.1
Recall	63.6	65.4	63.1	84.2
DSC	65.44	64.43	67.86	82.09

	Topcon dataset			Zeiss dataset
	CNR (mean ± std)	EP (mean ± std)	EME (mean ± std)	CNR (mean ± std)	EP (mean ± std)	EME (mean ± std)
Original	4.71 ± 0.59	1 ±0	0.12 ± 0.0016	3.76 ± 1.21	1 ±0	0.18 ± 0.01
CE-NML [36]	13.08 ± 2.7	0.92 ± 0.009	0.38 ± 0.008	7.76 ± 2.73	0.97 ±0.006	0.43 ± 0.01
Propsed Method	18.84 ± 4.37	0.92 ± 0.0.01	1.68 ± 0.078	10.47 ± 3.7	0.97 ±0.03	1.81 ± 0.10

	Original	MSBTD [4]	VAE [40]	Sparse SC-GMM [6]	Proposed Method
CNR (mean ± std)	1.42 ± 0.92	14.31 ± 3.72	18.46 ± 2.89	22.45 ± 2.54	28.39 ± 4.32
EP (mean ± std)	1 ± 0	0.84 ± 0.03	0.79 ±0.22	0.83 + 0.001	0.88 ± 0.001
EME (mean ± std)	0.30 ± 0.1	0.62 ± 0.051	0.78 ± 0.017	0.85 ± 0.017	0.92 ± 0.016

Retinal optical coherence tomography image analysis by a restricted Boltzmann machine

Abstract

1. Introduction

2. Restricted Boltzmann machine

2.1 Inference process

2.2 Training process

3. Contrast enhancement of optical coherence tomography B-scans

3.1. Evaluation

3.1.1 Evaluation on real OCT data

3.1.2 Evaluation on benchmark data

4. Hyperreflective foci detection in optical coherence tomography B-scans

4.1 Results

5. Conclusions

Funding

Acknowledgments

Disclosures

Data availability

Supplemental document

References

Supplementary Material (1)

Data availability

Cited By

Figures (12)

Tables (3)

Equations (23)

Biomedical Optics Express