## Abstract

Blind channel estimation is critical for digital signal processing (DSP) compensation of optical fiber communications links. The overall channel consists of deterministic distortions such as chromatic dispersion, as well as random and time varying distortions including polarization mode dispersion and timing jitter. It is critical to obtain robust acquisition and tracking methods for estimating these distortions effects, which, in turn, can be compensated by means of DSP such as Maximum Likelihood Sequence Estimation (MLSE). Here, a novel blind estimation algorithm is developed, accompanied by inclusive mathematical modeling, and followed by extensive set of real time experiments that verify quantitatively its performance and convergence. The developed blind channel estimation is used as the basis of an MLSE receiver. The entire scheme is fully implemented in a 65nm CMOS Application Specific Integrated Circuit (ASIC). Experimental measurements and results are presented, including Bit Error Rate (BER) measurements, which demonstrate the successful data recovery by the MLSE ASIC under various channel conditions and distances.

© 2013 Optical Society of America

## 1. Introduction

The constant growth in the demand for high bandwidth data transmission leads to higher challenges that should be resolved in the physical layer, and particularly by optical transmission technology.

The current high end transmission data rates are in the range of hundreds of Gbits/sec. One emerging technology that can support such bitrates for long distances (hundreds of kilometers and above) is coherent transmission and detection [1-8]. On the other hand, direct detection technology offers the use of lower cost optoelectronic components, consumes less power and enables overall lower latency solution. These advantages may be critical for short reach applications such as sub-hundred kilometers networks of metro-edge and data centers interconnections.

On the other hand, the simpler alternative, direct detection (non-coherent) optical technology is of lower cost, but limited to lower bit rates and/or shorter distances. For example, increasing the bitrate from 10Gbit/sec to 25Gbit/sec, results in distance reduction from ~80km to ~15km, for the same bit error rate (BER) performance. The main reason for this limitation is the inter-symbol interference (ISI) caused by chromatic dispersion (CD) and polarization mode dispersion (PMD). To combat this ISI two techniques are commonly used. The first technique is based on advanced modulation formats, together with partial response signaling [9,10], while the other approach is based on digital signal processing (DSP), applying electronic dispersion compensation (EDC). The EDC implementations with maximum likelihood sequence estimation (MLSE) at the receiver (Rx) side, is theoretically the optimal tool to combat ISI, this was very popular for 10Gbis/sec [11,12]. The combination of the two approaches is also possible, and was also theoretically investigated for 4x25Gbits transmission with the use of reduced bandwidth components [13].

Here, we focus on the blind channel estimation for the MLSE receiver for direct detection systems, which allows upgrading the current 10G systems to 100G (4x25G) systems, with extended reach of up to 40km uncompensated links. The task of joint channel and data blind estimation without a training sequence in hand for optical communication with direct detection is of high importance for MLSE processing. This topic is widely described in the literature [14-18] and references therein, where most of the focus is on the steady state operation, i.e. the tracking mode. However, the blind estimation of the optical channel suitable for the acquisition/initialization stage is less covered. Although various MLSE acquisition methods exist, most of them either require training sequence, or have a low convergence rate, or involve high implementation complexity [19-21].

Here, a novel, simple and fast blind channel estimation method for direct-detection optical systems is proposed. The main contribution of this paper is the blind channel acquisition algorithm, referred here as initial metrics determination procedure (IMDP). The initialization of the IMDP is based on the approximate discrete time equivalent (DTE) model, exploiting the most relevant physical properties of the fiber and the nonlinear photo-detector.

The proposed scheme requires neither additional hardware nor additional complicated calculations. The full blind equalization scheme was implemented in an application specific integrated circuit (ASIC) and was validated experimentally *at the full data rate of 4x28Gbit/sec*. The overall blind channel acquisition time is measured to be a few milliseconds, which makes it suitable for use in reconfigurable optical network environment that requires 50msec recovery time.

The rest of the paper is organized as follows. The blind MLSE architecture and a brief review of MLSE decoding principles are presented in Section 2. The simplified and approximated DTE model of the overall channel, including transmitter (Tx), fiber and Rx, which serves as a starting point of IMDP is derived in section 3. Section 4 presents the details of the blind channel acquisition algorithm (IMDP). The experimental setup used for the validation of the proposed scheme is described in section 5. Experimental results of the IMDP for optical back-to-back (b2b) and 40km channel are demonstrated in Section 6. The performance of the proposed algorithm, under various channel conditions is summarized in Section 7. Finally, Section 8 presents conclusive remarks.

## 2. Blind MLSE architecture and decoding principles

For a non-coherent system, maximum likelihood sequence estimation is proven to be the most effective stochastic technique for mitigating optical channel impairments such as chromatic dispersion and polarization mode dispersion [17]. While CD is a deterministic phenomenon for a given link, PMD is stochastic in nature, therefore an adaptive equalizer, that performs PMD tracking is required. Moreover, the adaptation properties of the MLSE can be also exploited for CD compensation when the amount of CD is not perfectly known. Basically, expensive tunable optical dispersion compensation may be replaced by the adaptive MLSE. To ensure sufficient tracking, the adaptation rate must be fast enough as compared to temporal variations of the channel. Since PMD changes in the scale of $100\mu \mathrm{sec}-1m\mathrm{sec}$, the adaptation rate must be at least ten times faster, meaning that every $10\mu \mathrm{sec}$ a new channel estimate must be obtained.

The channel estimates are called *metrics*, and are obtained by taking the (negative) logarithm of the conditional probability density functions (PDFs) of the received samples ${r}_{n}$ given the transmitted sequence$\left[{a}_{n},{a}_{n-1},\mathrm{...},{a}_{n-{N}_{isi}+1}\right]$of ${N}_{isi}$ consecutive symbols:

The key idea of the MLSE processor is to choose the path ${\Upsilon}_{opt}$ with the smallest running metric${\Upsilon}_{l}^{\left(k\right)}$among ${V}^{N}$candidate sequences of length $N$:

*histogram method*[15] is used to approximate the PDFs in Eq. (1). Since blind equalization is pursued, the histograms are collected in decision directed manner, as shown on Fig. 1.

The data path consists of the MLSE decoder which processes the samples coming from the analog-to-digital converter (ADC), based on current channel estimate, and passes the outcome bits (or symbols) to the data aggregator for further processing.

At the control path, there are three blocks that carry the channel estimation task in the following way. First, the properly delayed incoming samples are attributed to the output of the MLSE decoder, denoted here as “message”. Each incoming sample assigned to a group of ${N}_{isi}+1$ consequent bits (or symbols) in the “message” form an “event”. Next, the “events” are counted, and histogram set $H$, containing ${V}^{{N}_{isi}+1}$ branches is obtained:

*log*operation, branch metrics $M=\left\{{M}_{l}\left({r}_{n}|{a}_{n},{a}_{n-1},\mathrm{...},{a}_{n-{N}_{isi}+1}\right),\text{\hspace{0.17em}}\text{\hspace{0.17em}}l=0,\mathrm{...},{V}^{{N}_{isi}+1}\right\}$ are obtained forming the current channel estimate. In the steady state (tracking mode), the histograms, and thus the metrics, are updated iteratively, based on the observed data.

## 3. Blind channel acquisition - initial metrics determination procedure (IMDP)

The algorithmic flowchart of the blind MLSE acquisition stage, referred here as initial metrics determination procedure IMDP, is depicted on Fig. 2.

The IMDP can be divided into four main phases. At the first phase, the metrics set $M$ is taken from the predefined bank. Then, an iterative decoding procedure is activated, and several ($X$) decision-directed adaptation loops (later on being used as the tracking loops) are carried out. The third phase goal is to check whether the resulting metrics are converged. If convergence is not achieved, the next metrics set from the bank is taken. Otherwise, additional optimization procedure that maximizes the amount of ISI that is compensated by the MLSE is used. If the initial metrics bank is run out of metrics $j>{J}_{\mathrm{max}}$, then interrupt is generated to the central processing unit (CPU), which may decide to start the IMDP over.

The next four sections describe the details of the four IMDP phases, and answer the questions like*: How to define the bank of initial metrics that assures convergence (phase #1)? How the convergence is defined and what are the criteria that indicate convergence (phase #3)? How to monitor the convergence process (phase #2)? What is the ISI optimization stage (phase #4) and why is it needed?*

## 4. Definition of the metrics bank$M$

#### 4.1 The approximate overall channel DTE model

Direct detection optical channel systems are nonlinear in nature, mainly due to the square-law operation in the photo-detector and the intensity dependence of the fiber refractive index (the Kerr effect) [22]. Thus, the noiseless incoming sample is represented by a nonlinear combination of transmitted symbol ${a}_{n}$ and past${N}_{isi}^{\left(channel\right)}$ symbols:

The recorded signal at the photo-detector (PD) output, is given by:

#### 4.2 Definition of the metrics bank $M$ for phase #1

The key function that enables the blind MLSE processing is the proper definition of the metrics bank$M\triangleq \left\{{M}^{\left(j\right)},j=0,\mathrm{...},{J}_{\mathrm{max}}-1\right\}$, which allows operation in decision directed mode. These can be obtained by preparing a predetermined metrics bank, for example by transmitting a known data (training sequence) followed by generating and storing several metric sets for different channel conditions, as described in Fig. 1. In turn, while deployed in the system, the IMDP, described in Fig. 2, can be activated, and a proper initial metrics set can be selected from the bank$M$. This acquisition procedure is considered blind since at the field, no training sequence is required neither for channel nor for data estimation. However, this approach still requires some indirect knowledge about the channel in order to define optimal criteria, and therefore is not pursued here.

A different and novel approach for the definition of the metrics bank $M$based on method of moments (MoM) combined with knowing the physical behavior of the optical fiber is proposed. Since only coarse channel representation is needed, it may be assumed that the branch histograms ${H}_{l}\left({r}_{n}|{a}_{n},{a}_{n-1},\mathrm{...},{a}_{n-{N}_{isi}+1}\right)$ have nearly Gaussian shape and differ from each other only by the mean and variance, like in 23-24]. The mean values depend on the channel memory length ${N}_{isi}^{\left(channel\right)}$, the data vocabulary size $V$, and the dominant noise mechanism in the system. To ensure proper operation, the decoder must be designed such that the channel memory length is at most as the memory length of the decoder:${N}_{isi}\ge {N}_{isi}^{\left(channel\right)}$. In this case, there are ${V}^{{N}_{isi}+1}$ branches, whereas the variance of each histogram is associated with the noise power that is present in the corresponding combination describing the branch. For example, in a memory-less channel with binary vocabulary ($V=2$) there are two histograms, representing the corresponding conditional PDFs, and simple hard decision scheme can be used. When $V=2$ and ${N}_{isi}^{\left(channel\right)}=1$ there are four distinct histograms, having four different mean values. Generally, when ${N}_{isi}\ge {N}_{isi}^{\left(channel\right)}$, the actual number of histograms in the given MLSE decoder is constant, ${N}_{br}={V}^{{N}_{isi}+1}$, and consists of different groups, while all the members of such a group are identical. Continuing the example ($V=2$ and ${N}_{isi}^{\left(channel\right)}=1$), for ${N}_{isi}=4$ there are 32 branches. These branches can be divided into four groups, associated with the four different mean values mentioned above.

Based on the argumentation above the problem of selecting the proper set of metrics bank $M$can be formulated as follows: *Find the set of *${V}^{{N}_{isi}+1}$ mean values and corresponding variances that, together with Gaussianity assumption and correct ordering, lead to conditional PDFs that coarsely but still reliably describe the channel, i.e. result in BER that is low enough (<10^{−2}) to allow operation in decision directed mode.

Thus, we look for a bank of metrics $M$, which are derived from histogram sets $H\triangleq \left\{{H}^{\left(j\right)},j=0,\mathrm{...},{J}_{\mathrm{max}}\right\}$, having Gaussian shapes with the mean values vectors ${\mu}_{j}$ and corresponding variances vectors ${\sigma}_{j}^{2}$. Hence the metrics in $M$have the following form:

The values ${\mu}_{j}$can be determined by the FIR approximation of the operator $\Gamma (\cdot )$given by Eq. (15). Without loss of generality, the following analysis is restricted to the simplest on-off-keying (OOK) modulation format, i.e.$V=2$.

First it is assumed that the non-return-to-zero (NRZ) shaping pulse at the Tx is represented by the following impulse response${h}_{Tx}\left[n\right]={K}_{1}{\delta}_{n}$ in the DTE model, where ${N}_{Tx}=1$ in Eq. (10), and the constant ${K}_{1}$ depends on the transmitted power. Second, it is assumed that the bandwidth of the optical filter is wide enough, such that at the sampling point, the DTE impulse response of the OF is ${h}_{OF}\left[n\right]={K}_{2}{\delta}_{n}$, where ${N}_{OF}=1$ and ${K}_{2}$ depends on the OF shape. In practice however, the length of ${h}_{OF}\left[n\right]$,${N}_{OF}$, may be longer than a single symbol duration, especially in the environment of concatenated optical filtering (with optical add drop multiplexers). Consequently, according to Eq. (10) the length of the scalar impulse response is dominated by the length of ${h}_{CD}\left[n\right]$, ${N}_{CD}$ and Eq. (7) can be rewritten as:

Using similar argumentation, it can be assumed that ${h}_{Rx}\left[n\right]={K}_{3}{\delta}_{n}$. Noting that for an OOK format ${a}_{n}={\left|{a}_{n}\right|}^{2}$, and substituting Eqs. (6),(14) and (17) to Eq. (15) yields:

*non-negative*proportionality constant that depends on the responsivity and shapes of the Tx, optical and Rx filters.

The first two terms of Eq. (18) represent the linear part of ${r}_{n}$, and can be regarded as the *sum of the responses of two FIR filters with rectangular shapes, relatively delayed by* $\tau $:

*a data dependent FIR filters,*whose coefficients are proportional to $\mathrm{cos}\left(\pi \cdot W\cdot {\left(k-l\right)}^{2}\right)$:

*data-dependent coefficients*are non-zero. On average this filter can be approximated as:

Equations (19) and (20) summarize the exact mathematical model of the overall channel DTE FIR. For pragmatic acquisition purposes a coarse approximation is proposed. A closer examination of Eqs. (19) and (20) reveals that while Eq. (19) represents rectangular shape, Eq. (20) represents the sum of half period cosine terms multiplied by the random data samples. It can be shown empirically (by plotting the sum of Eqs. (19) and (20) for various data, CD and PMD values) that the FIR-equivalent filter can be approximated by either pre-cursor dominating ISI, post-cursor dominating ISI or symmetrical ISI filters.

Consequently, the bank of metrics $M$, can be generated by the following set of FIR filters,$B\triangleq \left\{{b}_{j},\text{\hspace{0.17em}}j=0,\mathrm{...},{J}_{\mathrm{max}}\right\}$, where ${b}_{j}$ is given by:

Figure 3 illustrates the examples of FIRs corresponding to each line in Eq. (22) for $c=2$and ${N}_{isi}=4$.

In practical implementations, like the one considered here, the MLSE decoder memory length is typically small (${N}_{isi}<5$), and the number of elements in $M$is finite and not too large. For example, in the MP1100 ASIC presented here, ${N}_{isi}=4$, resulting in ${J}_{\mathrm{max}}=10$ matrices in the bank as dictated by Eq. (22): 4 matrices with precursor ISI, 4 with post cursor ISI and 2 with symmetric ISI behavior. The overall acquisition time of the IMDP, in the worst case (when all matrices in the bank are to be examined) increases linearly with ${J}_{\mathrm{max}}$.

The mean vectors ${\mu}_{j}$in (16) can be obtained using Eq. (22) as follows:

where, in the simplest case,$A$is the ${V}^{{N}_{isi}+1}\times \left({N}_{isi}+1\right)$ matrix having all possible combinations of symbols in the vocabulary in increasing order, and ${N}_{ADC}$ is the nominal bit count of the ADC.Similarly, the vector of variances values ${\sigma}_{j}^{2}$can be calculated as follows:

where $S$ is the ${V}^{{N}_{isi}+1}\times \left({N}_{isi}+1\right)$matrix, having all possible combinations of symbols in the vocabulary in increasing order like $A$, but the values of various vocabulary symbols are replaced by the variance values, corresponding to these symbols in ISI free scenario. The variance values are typically derived from the SNR conditions in the system. In an optically amplified system the standard deviation for ‘1’, is higher than for ‘0’, depending on the OSNR conditions.For example, taking $V=2,{N}_{isi}=1$, the matrices $A$ and $S$ have the following form:

#### 4.3 The convergence test (phase #3) and convergence criterion

In order to verify whether the $X$ learning loops during phase #2, provide an $M$that describes the channel reliably enough, such that successful operation in decision directed mode is possible (BER<10^{−2}), the histograms in the corresponding $H$must possess certain statistical properties.

The only assumption that forms the basis of derivation of these properties is that the transmitted symbols are equiviprobable, i.e.:

Note that this assumption is also needed for using MLSE instead of maximum aposteriory probability (MAP) algorithm, and generally hold in practical systems which employ source coding and scrambling. In turn, Eq. (26) implies that the probability to transmit any combination of ${N}_{isi}+1$ consecutive symbols is:Therefore, if the decoder works correctly and the channel estimate $M$is reliable, there are ${N}_{br}={V}^{{N}_{isi}+1}$branches in $H$, each having an equal probability$p$ to appear. In other words, probability to assign the observation at the decoder input ${u}_{n}$to the correct combination of ${N}_{isi}+1$ consecutive decisions at the decoder output ${\Gamma}_{i}$is$p$, i.e. the probability of the “event” ${u}_{n}\in {\Gamma}_{i},\text{\hspace{0.17em}}i=0,\mathrm{...},{V}^{{N}_{isi}+1}$ is a Binomially distributed and given by:#### 4.4 Convergence monitoring during phase #2

Based on the argumentation in section *4.3*, one may use the sampled standard deviation of the central moments after $d$-th iteration, designated as$std\left({m}_{0}\right)\left[d\right]$, in order to monitor the convergence tendency of $H$ during phase #2:

In addition, an additional figure of merit is proposed, based on training a sequence, for illustrational purposes only. In this case the histogram set, ${H}_{training}$ is known, and one can measure the closeness of the obtained set ${H}_{blind}$ by means of sample Kullback-Leibler (KL) distance:

#### 4.5 Match point (MP) and ISI optimization (phase #4)

As it was already discussed in previous sections, and formulated by Eq. (4), the incoming sample is a nonlinear combination of a current symbol ${a}_{n}$ and${N}_{isi}^{\left(channel\right)}$previous symbols. Then the MLSE equalizer operates perfectly, if the memory length of the decoder ${N}_{isi}$ is greater than the channel memory, i.e. ${N}_{isi}\ge {N}_{isi}^{\left(channel\right)}$. However, in real life scenarios the opposite statement holds, i.e. ${N}_{isi}<{N}_{isi}^{\left(channel\right)}$. In this case, the MLSE equalizer performs sub-optimally, since it takes care only for the first ${N}_{isi}$terms, leaving some portion of *residual ISI* uncompensated. This residual ISI is treated by the decoder as noise, and is reflected into the variances of the branch histograms:

To illustrate this point, consider a simple FIR channel with ${N}_{isi}^{\left(channel\right)}$coefficients. |The noiseless received sample ${r}_{n}$ is given by:

The ISI in the system can be divided into two groups: the ISI handled by the MLSE with memory of ${N}_{isi}$ symbols and the residual ISI. The handled ISI should be selected according to a peak-distortion criterion:The optimal ${n}_{0}$ is called match point (MP), and in practice the ISI optimization is done as follows: several channel estimates (histograms) are collected, while each time different MP-shift ${n}_{0}$is set between the stream of ADC samples and the stream of the corresponding decision bits. Thus, each histogram represents a selection of a different subset of the channel ISI to be compensated by the MLSE. The contribution of ${\sigma}_{noise}^{2}\left(l\right)$and ${\sigma}_{ADC}^{2}$in Eq. (39) is the same on average. Therefore the variances-average of the histograms’ changes between these ${n}_{0}$ shifts, and is determined by the${\sigma}_{residual\text{\hspace{0.17em}}ISI}^{2}$. Hence, the selected ${n}_{0}$(the correct MP-shift) is the one that yields the minimal variances-average of the histograms:

## 5. Experimental setup and ASIC parameters

The proposed IMDP method was implemented within the MP1100Q ASIC and was verified experimentally using the following optical setup, shown on Fig. 4:

The pseudo-random bits sequence (PRBS) of length ${2}^{31}-1$ was generated and amplified by the driver to modulate the optical carrier. A Mach-Zehnder modulator (MZM) following a 1550nm DFB laser was used. The optical channel was a standard single mode fiber (SSMF) including an optical amplifier and an ASE noise source. Optical spectrum analyzer was used to measure the optical signal-to-noise ratio (OSNR). 50GHz optical filter was used in order to reduce the amount of received noise at the PIN photo detector (PD). The received electrical signal was processed by the MP1100, which has a built in PRBS checker that was used to obtain the BER results.

The MP1100 has an ADC with nominal resolution of ${N}_{ADC}=5$bits and effective number of bits (ENOB) of ~3.8 bits. An analog phase-lock loop (PLL) is used to recover the symbols clock, while the data is sampled at the symbol rate of 28Gsymbol/sec. The MLSE equalizer memory depth is ${N}_{isi}=4$ symbols, whereas its principle architecture is shown on Fig. 1.

## 6. IMDP explanation with experimental examples

In order to demonstrate the operation of the proposed blind channel acquisition algorithm (IMDP), we describe and analyze the outcome of the intermediate procedure phases (Fig. 2) for two cases: back-to-back channel and 40km channel.

#### 6.1 Back to back channel

Various steps of the IMDP are summarized in Figs. 5-8.

Figure 5 shows two different histogram sets ${H}_{\#1}^{\left(0\right)}$and ${H}_{\#1}^{\left(1\right)}$, that can represent PDFs describing a channel with memory depth of one symbol. The difference is that the histograms on the left hand side correspond to the increasing exponent $j=0$ in (22), whereas the histograms on the right hand side represent the decreasing exponent channel $j=4$ in (22) for ${N}_{isi}=4$ . In fact, the branch (or histogram) labeled '01' on Fig. 5(a) has different mean and variance values as compared to its counterpart on Fig. 5(b). The same is true for branch '10'. On the other hand the edge branches '00' and '11' have the same mean and variance values due to the symmetry present in (22).

It is worth to emphasize that both histogram sets ${H}_{\#1}^{\left(0\right)}$and ${H}_{\#1}^{\left(1\right)}$contain 32 branches each, which are divided into 4 groups, whereas each group is described by its mean and variance values (which coincide with the 4 histograms shown on Fig. 5). ${H}_{\#1}^{\left(0\right)}$and ${H}_{\#1}^{\left(1\right)}$are the outcome of phase #1, and serve as a starting point for phase #2 of the IMDP.

After 8 iterations during phase #2 the following histogram sets ${H}_{\#2}^{\left(0\right)}$and ${H}_{\#2}^{\left(1\right)}$, starting from ${H}_{\#1}^{\left(0\right)}$and ${H}_{\#1}^{\left(1\right)}$respectively, are obtained. By comparing Fig. 6(a) and Fig. 6(c), it is obvious that the initial guess ${H}_{\#1}^{\left(0\right)}$was not successful. On the other hand, comparison between Fig. 6(b) and 6(c) suggests that${H}_{\#1}^{\left(1\right)}$is a better guess which converges to histograms map similar to${H}_{training}$, therefore, a successful initial guess. In addition to the visual effect, the similarity between ${H}_{\#2}^{\left(1\right)}$and ${H}_{training}$can be quantitatively measured by means of the parameter ${D}_{ED}$defined in (38). In addition, the convergence rate can be quantified with the use of (35).

The values of ${D}_{ED}$throughout the 8 iterations of phase #2 are shown on Fig. 7(a). In our experiments, $X=8$ was the worst case for IMDP number of iterations to converge. It can be easily seen that ${H}_{\#1}^{\left(0\right)}$(blue circles) diverges, whereas ${H}_{\#1}^{\left(1\right)}$(green squares) converges to zero in terms of ${D}_{ED}$, meaning that the obtained ${H}_{\#2}^{\left(1\right)}$converges to ${H}_{training}$.

The convergence rate in terms of standard deviation of the central moments, as defined in (35), is presented in Fig. 7(b). Similarly, convergence of ${H}_{\#1}^{\left(1\right)}$ (green squares) and divergence of ${H}_{\#1}^{\left(0\right)}$(blue circles) are observed. The (final) value of $std\left({m}_{0}\right)$for ${H}_{\#2}^{\left(1\right)}$ also goes to zero, indicating that all the histograms in ${H}_{\#2}^{\left(1\right)}$ have similar number of observations. The bit error rate convergence during phase #2 is presented in Fig. 7(c) and compared to the BER obtained by training (solid purple line). The convergence in terms of BER is slower during the acquisition phase #2, as compared to ${D}_{ED}$and to $std\left({m}_{0}\right)$convergences. However, it is shown in the following figures that the final BER convergence, at the end of the IMDP, is similar to the training case.

The validation of convergence criterion Eq. (34), which forms phase #3 of the IMDP, is presented in Fig. 7(d), where the values of the moments ${m}_{0}^{\left(i\right)}$for various histograms in ${H}_{\#2}^{\left(0\right)}$ (blue circles) and ${H}_{\#2}^{\left(1\right)}$(green squares) are presented. It is clearly seen that ${m}_{0}^{\left(i\right)}$of ${H}_{\#2}^{\left(1\right)}$ lie within the upper and lower thresholds defined by Eq. (35), as opposed to the ${H}_{\#2}^{\left(0\right)}$counterpart. It should be stressed, that in hardware implementation Eq. (34) is applied only to the ${m}_{0}^{\left(i\right)}$of ${H}_{\#2}^{\left(1\right)}$, to save complexity and the duration of the acquisition process. The zero-th moments of ${H}_{\#2}^{\left(0\right)}$are presented here only for clarity and comparison insight.

The results of ISI optimization (phase #4) for the histograms set ${H}_{\#2}^{\left(1\right)}$are shown in Fig. 8. In each sub-plot, the titles contain the MP-shift, the BER and the average histograms variance calculated according to Eq. (45).In this “simple” back-to-back case, the major portion of ISI comes from the frequency response of the analog front-end of the ASIC. It can be seen in Fig. 8, that the effective smearing is between 1 and 2 symbol periods, resulting in three histogram sets, identified in Figs. 8(c)-8(e). The resulting BER in Figs. 8(c)-8(e) is similar (since the memory depth of the implemented MLSE is higher than the effective smearing, ${N}_{isi}>{N}_{isi}^{channel}$). The optimal shift is chosen such that the residual ISI is minimal according to Eq. (45), which also results in the lowest BER. In this case the optimal shift is zero, which also agrees with the fact that the histogram set from Fig. 8(c) is very close in terms of ${D}_{ED}$to the histogram set obtained by using the training sequence, shown on Fig. 6(c), and having the same shift.

#### 6.2 40km link

In this section a detailed example is presented for the case of 40km optical link. As predicted by Eq. (6), the initial histogram sets representing the channel with memory of one symbol duration ${N}_{isi}^{channel}=1$, ${H}_{\#1}^{\left(0\right)}$and ${H}_{\#1}^{\left(1\right)}$ depicted on Fig. 5, would not be sufficient since they result in high initial BER.

Therefore, if the link length is known a-priory, one can directly start the IMDP from histogram sets with higher channel memory, e.g. ${N}_{isi}^{channel}=2$in this case. However, in this paper we do not assume any side information about the link length (up to maximal length that the current hardware supports, which is about ~50km). Consequently, the whole IMDP procedure is repeated starting from three histogram sets: ${H}_{\#1}^{\left(0\right)}$and ${H}_{\#1}^{\left(1\right)}$, shown in Fig. 5, and ${H}_{\#1}^{\left(2\right)}$depicted on Fig. 9.

**Phase #2 of the IMDP also consists of 8 iterations, resulting in the histogram sets${H}_{\#2}^{\left(0\right)}$,${H}_{\#2}^{\left(1\right)}$ and${H}_{\#2}^{\left(2\right)}$presented on ****Figs. 10(a)****, ****10(b)**** and ****10(c)**** respectively. The histogram set obtained by using the training sequence ${H}_{training}$ is depicted on** Fig. 10(d) for comparison.

**It can be seen from ****Fig. 10****, that indeed the histogram sets without sufficient channel memory, ${H}_{\#1}^{\left(0\right)}$and ${H}_{\#1}^{\left(1\right)}$diverge, whereas ${H}_{\#1}^{\left(2\right)}$provides a good initial guess. The titles of** Figs. 10 (a), 10(b) and 10(c) contain the $std\left({m}_{0}\right)$values after 8th iteration, indicating quantitatively that only ${H}_{\#1}^{\left(2\right)}$has successful convergence.

A closer look to the convergence process during phase #2 of the IMDP is shown on Fig. 11. The convergence in terms of ${D}_{ED}$for the three histogram sets ${H}_{\#1}^{\left(0\right)}$(blue circles) ${H}_{\#1}^{\left(1\right)}$(green squares) and ${H}_{\#1}^{\left(2\right)}$(red triangles) is shown on Fig. 11(a).

The first observation is that all the three sets appear to stabilize around a constant ${D}_{ED}$value, but as already known from Fig. 10, only ${H}_{\#2}^{\left(2\right)}$tends to resemble the ${H}_{training}$. The fact that ${H}_{\#1}^{\left(0\right)}$and ${H}_{\#1}^{\left(1\right)}$stabilize around a constant ${D}_{ED}$value, does not imply that they converged to a correct channel estimation. Rather the opposite is true, and a closer look on Figs. 10(a) and 10 (b) reveals that the resulting histogram shapes in both sets almost uniformly spread throughout the ensemble range (the x-axis).

Another interesting observation is that despite the fact that ${H}_{\#2}^{\left(2\right)}$ converges, the final ${D}_{ED}$value (after 8 iterations) for ${H}_{\#2}^{\left(2\right)}$is higher than for ${H}_{\#2}^{\left(0\right)}$and ${H}_{\#2}^{\left(1\right)}$which eventually diverge. The reason for this is that ${H}_{\#1}^{\left(2\right)}$converged to a suboptimal, solution. ${H}_{\#2}^{\left(2\right)}$is indeed quantitatively 'far' from ${H}_{training}$, since the ${D}_{ED}$between them is not close to zero. Thissuboptimal solution will be improved during phase #4 of the IMDP. Thus, generally speaking, the KL-distance does not immediately show success, since several suboptimal solutions are possible, and only the optimal reference PDF (or its histogram representative) is relevant for comparison.

On the other hand, by observing the intermediate values of the $std\left({m}_{0}\right)$criterion (Fig. 11(b)), one can conclude that ${H}_{\#1}^{\left(0\right)}$and ${H}_{\#1}^{\left(1\right)}$diverge, whereas ${H}_{\#2}^{\left(1\right)}$converges (to a valid suboptimal solution). The BER convergence, shown on Fig. 11(c), also reveals that the first two histogram sets diverge (BER = 0.5) and the latter set converges to a suboptimal solution, indicated by a slightly higher BER than the one obtained with ${H}_{training}$(solid purple curve).

The practical way to conclude whether a given histogram set is converged to a valid (possibly suboptimal) solution, without observing the resulting histogram sets ${H}_{\#2}^{\left(0\right)}$, ${H}_{\#2}^{\left(1\right)}$and ${H}_{\#2}^{\left(2\right)}$, is to assure that all the zero-th moments of the resulting histograms within the set lie within a predefined range, given by (34). Figure 11(d) shows that only ${H}_{\#2}^{\left(1\right)}$ meets (33), and thus is the only selected metric that is being processed in phase #4.

The histogram sets representing the outcome of the ISI optimization phase of the IMDP are shown in Fig. 12. As indicated by both BER and the average standard deviations (ref formula for mean(vars)), the MP-shift of one symbol (Fig. 12(d)) obtains the best performance.

In addition, in Fig. 12(d), ${D}_{ED}\left({H}_{training}\left|\right|{H}_{\#4}^{\left(1\right)}\right)=0.127$, is the lowest achieved value, which verifies that optimal solution is obtained.

## 7. Summary of experimental IMDP performance

Figure 13 summarizes the experimental measurements, comparing the BER results for various OSNR values, obtained by the data aided approach (training sequence) vs. blind channel acquisition algorithm (IMDP) proposed here. The pre-FEC BER level of ${10}^{-3}$is also shown for convenience.

It can be seen that the proposed blind IMDP technique achieves BER values that are in very good agreement with the BER values obtained by the use of training sequence. Thus, it indicates that a reliable channel estimation is obtained by the proposed blind technique, for OSNR values that result in $BER<{10}^{-2}$.

## 8. Conclusion

A novel blind channel acquisition technique for MLSE equalization in high speed optical communications has been proposed. It performs joint channel and data estimation in decision directed mode. This new method was implemented in a high speed 100G ASIC, and has been verified by real time lab experiments. The proposed method eliminates the need for using training sequence or any side information of the channel, is robust, adaptive, can cope with wide range of optical channel conditions, and can be used in a wide range of receivers that include MLSE engine.

## References and links

**1. **M. G. Taylor, “Coherent detection method using DSP for demodulation of signal and subsequent equalization of propagation impairments,” Photon. Technol. Let. **16**(2), 674–676 (2004). [CrossRef]

**2. **E. Ip and J. M. Kahn, “Digital equalization of chromatic dispersion and polarization mode dispersion,” J. Lightwave Technol. **25**(8), 2033–2043 (2007). [CrossRef]

**3. **S. J. Savory, “Compensation of fibre impairments in digital coherent systems,” ECOC 2008, 21–25 (Brussels, Belgium, 2008).

**4. **S. J. Savory, “Digital filters for coherent optical receivers,” Opt. Express **16**(2), 804–817 (2008). [CrossRef] [PubMed]

**5. **C. R. S. Fludger, T. Duthel, D. van den Borne, C. Schulien, E. D. Schmidt, T. Wuth, J. Geyer, E. De Man, Khoe Giok-Djan, and H. de Waardt, “Coherent equalization and POLMUX-RZ-DQPSK for robust 100-GE transmission,” J. Lightwave Technol. **26**(1), 131–141 (2008). [CrossRef]

**6. **K. Roberts, M. O'Sullivan, K. T. Wu, H. Sun, A. Awadalla, D. J. Krause, and C. Laperle, “Performance of dual-polarization QPSK for optical transport systems,” J. Lightwave Technol. **27**(16), 3546–3559 (2009). [CrossRef]

**7. **M. Kuschnerov, F. N. Hauske, K. Piyawanno, B. Spinnler, M. S. Alfiad, A. Napoli, and B. Lankl, “DSP for coherent single-carrier receivers,” J. Lightwave Technol. **27**(16), 3614–3622 (2009). [CrossRef]

**8. **A. Gorshtein, O. Levy, G. Katz, and D. Sadot, “Coherent compensation for 100G DP-QPSK with one sample per symbol based on anti-aliasing filtering and blind equalization MLSE,” Photon. Technol. Let. **22**(16), 1208–1210 (2010). [CrossRef]

**9. **P. J. Winzer and R.-J. Essiambre, “Advanced optical modulation formats,” Proc. IEEE **94**(5), 952–985 (2006). [CrossRef]

**10. **I. Lyubomirsky, “Advanced modulation formats for ultra-dense wavelength division multiplexing,” http://www.journalogy.net/Publication/5674594/advanced-modulation-formats-for-ultra-dense-wavelength-division-multiplexing

**11. **J.-P. Elbers, H. Wernz, H. Griesser, C. Glingene, A. Faerbert, S. Langenbach, N. Stojanovic, C. Dorschky, T. Kupfer, and C. Schuliea, “Measurement of the dispersion tolerance of optical duobinary with an MLSE-receiver at 10.7 Gb/s,” Proc. OFC2005, OThJ4. [CrossRef]

**12. **J. D. Downie and Jason Hurle, “Chromatic dispersion compensation effectiveness of an MLSE-EDC receiver for three variants of duobinary,” IEEE/LEOS Summer Topical Meetings, TuA1.2 (2007).

**13. **A. Gorshtein, O. Levy, G. Katz, and D. Sadot, “Low cost 112G direct detection metro transmission system with reduced bandwidth (10G) components and MLSE compensation,” SPPCoM2011, SPWC4.

**14. **G. Katz and D. Sadot, “Channel estimators for maximum-likelihood sequence estimation in direct-detection optical communications,” Opt. Eng. **47**(4), 31–34 (2008). [CrossRef]

**15. **O. E. Agazzi, M. R. Hueda, H. S. Carrer, and D. E. Crivelly, “Maximum-likelihood sequence estimation in dispersive optical channels,” J. Lightwave Technol. **23**(2), 749–763 (2005). [CrossRef]

**16. **N. Stojanovic, “Tail extrapolation in MLSE receivers using nonparametric channel model estimation,” IEEE Trans. Signal Process. **57**(1), 270–278 (2009). [CrossRef]

**17. **T. Foggi, E. Forestieri, G. Colavolpe, and G. Prati, “Maximum-likelihood sequence detection with closed-form metrics in OOK optical systems impaired by GVD and PMD,” J. Lightwave Technol. **24**(8), 3073–3087 (2006). [CrossRef]

**18. **W. Chung, “Channel estimation methods based on Volterra kernels for MLSD in optical communication systems,” Photon.Technol. Let. **22**(4), 224–226 (2010). [CrossRef]

**19. **Y.-J. Jeng and C.-C. Yeh, “Cluster-based blind nonlinear-channel estimation,” Tran. Sig. Proc. **45**(5), 1161–1172 (1997). [CrossRef]

**20. **E. Zervas, J. Proakis, and V. Eyuboglu, “A quantized channel approach to blind equalization,” in *Proc. ICC*, Chicago, IL, **3**, 1539–1543 (1992)

**21. **M. Ghosh and C. L. Weber, “Maximum likelihood blind equalization,” Opt. Eng. **31**(6), 1224–1228 (1992). [CrossRef]

**22. **G. Agrawal, *Fiber Optic Communications Systems* (John Wiley & Sons, Inc., 2002)

**23. **M. S. Alfiad, D. van den Borne, F. N. Hauske, A. Napoli, B. Lankl, A. M. J. Koonen, and H. de Waardt, “Dispersion tolerant 21.4 Gb/s DQPSK using simplified Gaussian Joint-Symbol MLSE,” Proc. OFC2008, paper OTHO3. [CrossRef]

**24. **F. N. Hauske, B. Lankl, C. Xie, and E.D. Schmidt, “Iterative electronic equalization utilizing low complixity MLSEs for 40 Gbit/s DQPSK modulation,” Proc. OFC2007, paper OMG2.

**25. **G. Box, W. G. Hunter, and J. S. Hunter, *Statistics for Experimenters* (Wiley-Interscience, 1978)

**26. **R. C. Sprinthall, *Basic Statistical Analysis* (Pearson Allyn & Bacon, 2011)