## Abstract

We experimentally demonstrate 20 Gbaud 256QAM and 40 Gbaud 128QAM in an all-silicon IQ modulator. We combine a linear equalizer and a nonlinear predistortion implemented in a lookup table (LUT). We achieve bit error rate (BER) below the 20% forward error correction threshold; linear equalization alone cannot achieve this performance. To keep LUT size manageable, we use one dimensional LUTs and prune entries. We achieve good BER even when LUT size is halved. Finally, we verify the generality of the proposed methods on different data sets.

© 2019 Optical Society of America under the terms of the OSA Open Access Publishing Agreement

## 1. Introduction

The volume of data processed in data centers is rising exponentially due to video services and machine-learning applications by companies such as Facebook, Google, etc. [1]. The growing demand for bandwidth, high speed and low cost data transmission has focused attention on silicon photonic (SiP) technology to scale up performance at smaller footprint, lower cost, and higher power efficiency as compared to other technologies. Furthermore, complementary metal-oxide semiconductor (CMOS) compatibility for mass production makes SiP based platforms a promising solution for advanced data center architectures [2].

The performance of SiP based IQ modulators has been investigated extensively in recent years. From the first experimental demonstration of quadrature amplitude modulation (QAM) transmission for quadrature phase shift keying (QPSK) format at 50 Gb/s in 2012 [3] to the recent report of high data rate transmission of 70 Gbaud 32QAM [4] and 100 Gbaud 32QAM [5]. Digital signal processing (DSP) techniques, in conjunction with improved SiP modulator design, is boosting performance [2].

Using high symbol rate (baud rate) signaling and advanced modulation formats enables increased capacity and spectral efficiency for single carrier optical transmission. However, pushing the optical systems into marginal operating regimes can pose technical challenges which limit end-to-end performance. One of the main sources of transmission impairments is electrical bandwidth limitations of the transceiver components, *i.e.*, digital-to-analog converters (DAC), RF amplifiers, and electro-optical (E/O) converters used with external modulators. Bandwidth limitations of these components lead to inter-symbol interference (ISI). RF amplifiers and Mach Zehnder modulators (MZMs) exhibit nonlinear behavior when driven at high peak-to-peak RF voltage, creating another source of distortion [6].

SiP modulators suffer from an additional source of nonlinearity than other MZMs. Unlike lithium niobate devices (LiNbO_{3}), SiP phase shifters in SiP modulators are not linear in applied voltage [7]. This introduces extra distortion as the peak-to-peak RF voltage gets closer to 2*V _{π}* of the modulator operating at the null point (

*V*: half-wave voltage of the modulator). The combination of ISI and nonlinearity causes the sampled received signal to cluster (ISI effect) at constellation points that are not aligned on a rectangular grid (nonlinearity); this is known as constellation warping [8]. These combined effects (ISI and nonlinear) create pattern dependent distortion that cannot be entirely compensated by a linear equalizer [9]. The impairments become more severe at higher modulation formats due reduced Euclidean distance between symbols [6].

_{π}Digital predistortion (DPD) algorithms are employed at the transmitter side reduce the penalty induced by non-ideal components; it allows us to increase the spectral efficiency and baud rate, and offers more degrees of freedom in the trade-off of forward error correction overhead [6]. Various DPD algorithms have been proposed that can be used in wide variety of optical communications platforms (*i.e.*, SiP or LiNbO_{3}).

In [10], an adaptive method based on indirect learning architecture is used to estimate the coefficients of a DPD module based on a memory polynomial model of the modulator. Linear and nonlinear DPD are compared in [11] using a Volterra series up to 3^{rd} order for estimation of the nonlinear DPD module. While popular, these solutions require high complexity in terms of coefficients extraction and hardware implementation [12]. Furthermore, the required number of coefficients rises exponentially at high data rate, as more memory and order are needed to build an effective DPD module [13]. Recently, we demonstrated an iterative learning control technique using quasi-real-time adaptation for SiP IQ modulator to compensate nonlinear distortion of several baud rates (20 Gbaud to 60 Gbaud) for various QAM modulation formats [14]. Although the method is promising for a wide range of modulation formats and baud rates, it needs repetitive data sequence for convergence. To exploit the predistortion method on arbitrary data sequences, a behavioral method, *e.g.*, a memory polynomial based DPD, is required.

Lookup table (LUT)-based DPD is another well-known approach for mitigating pattern dependent distortion, that is, ISI combined with nonlinearity in the optical transmission [15–19]. A LUT maps an input symbol to a modified constellation point to bring the received symbol closer to the desired (*i.e.*, input constellation point). The LUT-based compensation method can be used alone or combined with other compensation techniques as well. For instance, LUT-based DPD was implemented to enhance the direct detection transmission through a integrated distributed feedback laser with an MZM modulator [15]. 1-Tb/s wireless transmission 4×4 multiple-input multiple-output 64QAM was achieved by combining the use of a LUT and probabilistic shaping [16]. In [17], a receiver side LUT with optical pulse shaping mitigates pattern dependent distortion. Two one-dimensional (1D) LUT tables were used to separately compensate in-phase and quadrature components. A LUT-based maximum a posterior (MAP) DPD was proposed in [18], to compensate the ISI effect caused by the limited DAC bandwidth. In [19], a joint DPD approach was proposed for pulse amplitude modulation (PAM) at the transmitter side. The joint DPD includes a linear equalizer and a LUT to overcome linear and nonlinear distortion, respectively. In addition, for higher order PAM, *i.e.*, 16PAM, a modified LUT was proposed including entries only for higher amplitude symbols (*i.e.*, ±5, ±7) to reduce the size of LUT tables by half.

In this work, we propose a two-step DPD, combining a linear minimum mean square error (MMSE) pre-equalizer and a full-complexity pair of 1D-LUTs. We examine them experimentally with back-to-back transmission at 20 Gbaud 256QAM and 40 Gbaud 128QAM using a SiP IQ modulator, and report performance improvement. We introduce a reduced-size LUT-based DPD and observe comparable performance when nearly halving the table size for 20 Gbaud 256QAM and 40 Gbaud 128QAM. Unlike the work presented in [19], we do not limit the LUT to outer symbols only (a one-shot approach), but instead include memory effects. We build the LUT tables based on the error caused by patterns which can include both inner and outer symbols. Next, we validate the populated reduced-size LUTs for other random sequences to show the generality of the proposed methods.

The reminder of this paper is organized as follows. Section 2 describes the LUT-based predistortion principle applied to the SiP IQ modulator. Section 3 presents the experimental set-up for SiP IQ modulator signal generation and discusses the DSP algorithm applied to recover data. In section 4, the source of nonlinear distortion observed during the experiment is discussed. In section 5 we present experimental results on a full-size LUT. In the next section, the reduced-size LUT is introduced followed by the experimental investigation of its performance. In section 7, we examine LUT performance for various data sets to verify the generality of the proposed methods. We offer concluding remarks in section 8.

In this paper, we extend our preliminary results on 20 Gbaud 256QAM in [20] to present experimental results for 40 Gbaud 128QAM. We examine the impact of severe bandwidth limited operation on the nonlinear LUT compensation. We present much greater detail on our LUT method, contrast it with previous approaches, and validate on multiple data sets.

## 2. LUT-based predistortion methodology

Consider a lookup table to implement a predistortion to correct for channel memory of length *n*, where *n* is an odd integer. Let *l* be an index to all length *n* patterns of symbols. Symbols can take any of *M* values from an M-QAM constellation. Let *P*(*l*) be the *l*^{th} pattern of *n* symbols that represents the LUT address. For each pattern, we populate the LUT with an entry that is to be subtracted from the middle symbol of the pattern before transmission, *i.e.*, the predistortion term is applied to one symbol based on the symbols that precede and follow it. Let *X _{k}* be the symbol to be transmitted at time instant

*k*. During transmission, a sliding window is passed over the data to be transmitted, isolating the pattern represented by the vector

*X̱*

_{k}*l*is found for pattern

*X̱*; the LUT entry at

_{k}*l*is then subtracted from

*X*before transmission.

_{k}The LUT address and entries are two-dimensional (2D) when symbols are complex, as is the case with QAM modulation with in-phase and quadrature coordinates. The LUT is one-dimensional (1D) for real symbols, such as for PAM modulation. One of the greatest challenges in working with a LUT is the memory required, especially in QAM modulation. While suboptimal, we can apply predistortion in the form of two independent 1D LUTs instead of a 2D LUT. For M-QAM modulation, LUT size is reduced from *M ^{n}* to 2

*M*

^{n/2}, or roughly by the square root [21]. For two 1D LUTs we have index

*i*to the pattern in the in-phase branch, and index

*q*for the quadrature branch pattern, where

*P*(

*l*) =

*P*(

_{I}*i*) +

*jP*(

_{Q}*q*).

A 2D LUT assigns unique corrections for each symbol pattern *l*. Using two 1D LUTs for M-QAM leads to quite different behavior. There could be many symbol patterns with the same I coordinate sequence; although the Q coordinate sequences could be very different, the I correction would be the same for all these symbol patterns. The use of two 1D LUTs loses the ability to exploit correlations in IQ distortions; while the solution is suboptimal, it is tractable.

We adopt two 1D LUTs as presented in Fig. 1, where the transmitted and received data sequences are *X̱* and *Y̱*, respectively. We have a training phase (see blue, outlined branches in block diagram) during which the LUT entries are found, and an operating phase (see gray, shaded branches in block diagram) during which performance is assessed.

During the training phase, at each symbol interval two sliding windows of length *n*, *SW*(*n*), output indexes (*i*, *q*) into two separate 1D LUTs: index *i* to the I-LUT-n for the I component of the symbol sequence, and index *q* to the Q-LUT-n for the Q component. The transmission output at symbol time *k*, *Y _{k}*, is used to calculate a complex error, defined as the difference between the complex input and output symbols

*i*and

*q*. We assume that the nonlinear distortion is an additive quantity that is pattern dependent. Since the training sequence transmitted may contain several instances of the the same 1D patterns, we average over multiple occurrences before populating the LUT arrays. The LUT entries are therefore

*N*is the number of times pattern

_{i}*i*occurs.

During the operation phase, the input data sequence *X̱ _{k}* again goes through sliding windows to find the proper index (

*i*,

*q*). A predistorted value for the

*k*

^{th}symbol is formulated,

*X̃*, using the newly populated LUTs,

_{k}*i.e.*, applying the indexed correction factors for I and Q branches from I-LUT and Q-LUT as seen Fig. 1.

## 3. Experimental set-up and digital signal processing

The experiment to measure the SiP IQ transmitter response and populate the two 1D LUTs is presented in Fig. 2. The grey shading indicates the optical paths; all other paths are electrical. The transmitter and receiver side DSP is summarized in tables to either side. The SiP IQ modulator has a single drive, push-pull configuration that is reverse biased at 0.75 V. We measured the 3 dB bandwidth at various DC biases by a vector network analyzer. At 0.75 V, it is approximately 32 GHz (see inset in Fig. 2); *V _{π}* is approximately 7.25 V. The operating point and bias mode are controlled through DC voltage sources. We operate the SiP IQ modulator at the null point. Further details regarding the modulator can be found in [4]

Due to high packaging losses in working with the SiP chip (9 dB coupling loss from the fiber array to the I/Q grating couplers, 8 dB modulator loss, and 3 dB splitting loss from the on-chip adiabatic 50:50 coupler), several erbium doped fiber amplifiers (EDFAs) are used. A tunable external cavity laser (ECL) at 1530 nm with output power of 10 dBm and linewidth less than 100 kHz is boosted by a high power EDFA to 22 dBm. The optical output of the SiP IQ modulator is amplified by a two stage EDFA; two optical bandpass filters (OBPF) suppress out-of-band amplified spontaneous emission noise.

Consider the transmitter side DSP chain. A pseudo random bit sequence (PRBS) of length 2^{24} − 1 (PRBS24) is Gray mapped and denoted *X̱* in Fig. 2. In the training phase the LUT block is not engaged. We apply a 500-tap MMSE pre-equalizer whose output is *X̱*_{DPD}; this signal is upsampled and shaped to a raised cosine pulse with roll-off of 1. The shaped pulse is resampled to accommodate the 84 GSa/s Fujitsu DAC (8-bit, 18 GHz) for operation at the desired baud rate (20 or 40 Gbaud). The resampled symbol sequence is clipped before being uploaded to the DAC.

The two DAC outputs, corresponding to the in-phase and quadrature components of an M-QAM signal, are passed through tunable RF phase shifters (PS) to synchronize them in time before being amplified with two RF power amplifiers (SHF, 50 GHz) to achieve a 5 V peak-to-peak swing (at 20 Gbaud). We use a variable optical attenuator (VOA) to sweep the received optical power. A discrete coherent receiver (CoRx) with 70 GHz bandwidth and a local oscillator (LO) with 16.3 dBm power are used for reception. Electrical outputs are digitized by a 160 GSa/s, 60 GHz, real-time oscilloscope (RTO) from Keysight.

Offline DSP is applied to recover data, as illustrated in Fig. 2. A 4^{th} order Butterworth low-pass filter (LPF) suppresses noise at the RTO sampling rate, then we downsample to one sample per symbol. We use fast Fourier transform (FFT)-based frequency offset compensation (FOC) and a blind phase search with 64 test angles for carrier phase recovery (CR). A linear MMSE equalizer (500 taps) post-compensation is used.

During the training phase, the LUTs are populated as described in section 2. During the operating phase, the LUT block is applied before linear MMSE equalizer at the transmitter side DSP. We consider three sizes of sliding windows, or equivalently, three lengths of the LUT address: sequences of length 3, 5 or 7.

## 4. Amplifier versus modulator as a source of distortion

We applied linear pre-equalization at the transmitter and linear post-equalization at the receiver (both minimizing mean squared error) using the setup described in the previous section. At 20 Gbaud, 256QAM was transmitted, while at 40 Gbaud, 128QAM was transmitted. Figure 3(a) and 3(b) presents the recovered constellations; neither is symmetric. In Fig. 3(c) and 3(d) the average absolute error on each constellation point is plotted in 3D. Both plots exhibit nonuniform error. At 20 Gbaud, the non-uniformity in the error is most pronounced (see the top left section of the plot versus the bottom right side).

The asymmetric constellations could have their origin in: 1) RF amplifier saturation, 2) fabrication errors in the modulator (e.g., different I and Q phase responses), and 3) modulator nonlinear frequency response (dependence on bias voltage). The fabrication error effect is negligible compared to others since, *V _{π}* is much larger than the driving voltage. The first effect is RF only and can be confirmed with electrical back-to-back measurements. The irregular constellation at 20 Gbaud in Fig. 3(a) resembles that in the electrical back-to-back measurement; this is not the case for the 40 Gbaud constellation. At 40 Gbaud the output of the DAC falls below the maximum 500 mV output, hence leading to less RF amplifier saturation. We conclude that at 20 Gbaud the RF amplifier nonlinearity dominates.

At 40 Gbaud we are probing the roll-off of the silicon modulator bandwidth. The peak-to-peak voltage, V_{pp}, of the RF amplifier is about 5 V. Since the driving configuration is series push-pull, half V_{pp} is applied to each branch (I and Q). Given the bias voltage of −0.75 in our experiments, the voltage on each pn junction swings [22] from −2 V to 0.5 V. As seen in the inset of Fig. 2, the small signal 3 dB bandwidth around each of these peak voltages varies greatly (between 20 and 32 GHz). Therefore, at 40 Gbaud dynamic nonlinear response of the modulator has greater impact than it does at 20 Gbaud. While some RF amplifier distortion may still be present at 40 Gbaud, the optical sources of nonlinearity have more impact on the asymmetry of the plot in Fig. 3(d).

## 5. Experimental results for full-size LUT

This section presents experimental results for full-size LUT. We always present illustrations for the in-phase LUT, as the quadrature LUT yields similar results. We first discuss experimental challenges when working with LUTs. We define pattern dependent distortion (PDD) and examine its dependence on LUT memory depth and system baud rates. The bit error rate (BER) when using LUTs of various memory depths is compared to that when using a linear pre-equalizer alone.

#### 5.1. Full-size LUT experimental challenges

We use a PRBS to train and test the effectiveness of our LUT solution. Due to hardware limitations, we obtain most results for PRBS24; other length PRBS are discussed in section 7. A recovered symbol sequence of fixed length (regardless of LUT memory depth) is used to populate the LUTs. For short LUT memory depth (*n* = 3) we see multiple instances of all unique patterns. For long LUT memory, we see multiple instances of only a small subset of unique patterns. Table 1 summarizes the number of unique sequences in random data vs. PRBS24 for the two modulation formats examined. Clearly size *n* = 7 would lead to excessively large LUTs, but results are included for comparison. A practical LUT would be trained over all patterns, not just those in a PRBS. In section 7 we will run trials using various PRBS lengths (each seeing different subsets of unique patterns) and validate the generality of our results.

In an ideal world, the LUT memory depth (i.e., address length *n*) would equal the system memory in symbol intervals. When sweeping *n*, LUT performance improvement would plateau when we reach this mark. Table 1 also indicates the number of times each pattern is repeated (in average as it varies slightly by pattern), corresponding to *N _{i}* in the averaging seen in (3). Having a high number of repeated patterns helps with noise averaging and limits estimation error. In our experimental situation, larger

*n*= 5 and

*n*= 7 cases see less noise averaging than

*n*= 3, hence less accurate predistortion. The two effects, noise averaging vs. limited memory depth, come into play in evaluating performance.

#### 5.2. Pattern dependent distortion

We refer to error for the LUT, *e.g.*, I-LUT-*n*(*i*), as PDD, where *i* is an index to a unique pattern. Contrast this with the error in Fig. 3 which is for a constellation symbol rather than for a symbol in the center of a pattern. Figure 4 presents PDD for *n*=3, 5 and 7 for both 20 Gbaud 256QAM and 40 Gbaud 128QAM. The value of mean, *μ*, and standard deviation, *σ*, are shown as an inset for each case. The standard deviation for *n* = 3 is lowest among all memory depths, consistent with our observation of better noise averaging for this case.

The mean value of the PDD is lower for the 40 Gbaud 128QAM case than for 20 Gbaud 256QAM. Thus, the greater asymmetric distortion at 20 Gbaud 256QAM seen earlier for symbols (Fig. 3 “sequences” of length one) holds true for all sequence lengths examined.

For the 20 Gbaud 256QAM case, *σ* and *μ* are comparable for *n* = 5 and *n* = 7, suggesting the system nonlinearity may have depth of five symbols. However, these comparable results might also be attributable to *n* = 5 and *n* = 7 having the same number of unique patterns in PRBS24 (Table 1). For the 40 Gbaud 128QAM case, *σ* rises for *n* = 7. In consulting Table 1, there is the least noise averaging for this case. The standard deviation is consistently higher for 20 Gbaud 256QAM, probably as a result of the greater RF amplifier nonlinearity in this case where the drive voltage saturates.

#### 5.3. Full-size LUT BER performance

Once applying the LUT, we use constellations (for a qualitative assessment) and bit error rate (for a quantitative assessment) and rather than rely on the resulting new PDD. Figure 5 presents BER versus received optical power. Curves are included for linear pre-equalization alone (green, stars) and for LUT memory depth of *n*=3 (black, crosses), 5 (red, squares) and 7 (blue, circles). Figure 5(a) is for 20 Gbaud 256QAM and Fig. 5(b) for 40 Gbaud 128QAM. In both cases, when using linear pre-equalization alone the BER is above the 20% overhead forward error correction (FEC) threshold of BER =2.4×10^{−2}. Use of a LUT with *n*=3 pushes BER below the threshold, including at received optical power as low as approximately −6 dBm. Increasing to *n*=5 and *n*=7 further improves the performance. The minor improvement at *n*=7 would not justify the extremely high jump in LUT complexity. Nearly identical performance for *n*=5 and *n*=7 for most received powers in Fig. 5(a) tracks with our observation of similar PDD mean and variance, see Fig. 4(a), for these cases. For 40 Gbaud 128QM, LUT-5 can increase the power sensitivity by more than 3 dB compared to the linear pre-equalizer alone.

We focus on the case of LUT-7 to examine how the predistortion varies for the two cases. Figure 6(a) and 6(d) show the predistorted constellations for 20 Gbaud 256QAM and 40 Gbaud 128QAM, respectively. Instead of distinct points, we see clouds corresponding to different corrections depending on the preceding and following symbols. The mean of the absolute value of the predistortion per constellation point is shown in a colormap in the Fig. 6(b) and 6(e); taking an average over the symbol-by-symbol means would give the overall constellation means different from the 0.19 and 0.003 reported in Fig. 4(c) and 4(f), respectively, as here we examine with the absolute value of PDD. The standard deviation of the absolute value of the predistortion per constellation point is shown in a colormap in the Fig. 6(c) and 6(f).

Contrast the structure of mean of the 20Gbaud 256QAM predistortion in Fig. 6(b) with its standard deviation in Fig. 6(c). The mean shows that constellation points with coefficient −15 see the largest predistortion. The standard deviation is instead mostly uniform across the constellation. This is consistent with our previous observation that RF amplifier saturation is the dominant source of nonlinearity. Certain points are saturated, but the patterning (or memory) effect is the same despite saturation.

Consider now the case of 40 Gbaud 128QAM. The symmetry of the mean predistortion has changed. All outer symbols see increased predistortion. The standard deviation is quite non-uniform, unlike the previous case. The top right corner of Fig. 6(f) see much greater variance, because the memory length of the LUT is insufficient to capture the memory effects introduced by the modulator being driven at the extreme of its bandwidth. This nonlinear ISI effect comes from dynamic behavior of the silicon modulator when the pn junction applied voltage affects the bandwidth of the modulator.

## 6. Experimental results for reduced-sized LUTs

A full-size LUT is one containing all 1D sequences in PRBS24 for the given constellation. A reduced-size LUT (R-LUT) is one with some subset of the full-size LUT entries removed; its performance is discussed in this section. In R-LUT, for a given memory depth, the LUT size is reduced by eliminating patterns with small PDD values. These omissions have little impact on LUT performance. R-LUT methodology is discussed and BER performance is presented for 20 Gbaud 256QAM and 40 Gbaud 128QAM.

#### 6.1. R-LUT implementation description

We observe in Fig. 3 that outer constellation points suffer from more distortion than the inner ones. As all constellation points suffer different levels of distortion, one solution to reduce LUT size is to exclude all patterns except those whose center symbol is one of the vulnerable constellation points. This was the strategy in [19] for PAM-8 transmission where outer points were much more distorted than all others. Our solution is more nuanced. We propose to take into account memory effects for those most vulnerable symbols. For instance, in our case although on average the inner constellations points have low error, some patterns may see high PDD for that constellation point. Thus, we examine patterns to determine which are most vulnerable, thus including the influence of the system memory.

Figure 7(a) presents the absolute value of the PDD of the in-phase LUT with respect to the pattern index (patterns are indexed in no particular order). There is high variance across different patterns, and we target patterns with less error to be eliminated from the LUT. The number of patterns remaining after elimination, divided by the number of patterns in full-size LUT, describes the R-LUT size in percentage. We set four threshold levels |I − LUT(i)|⩾ *η*_{j}, ranging from 20% to 80% of full-size LUT, where I − LUT(i) is the average in-phase error per (3). The same approach is used for the quadrature component of the error. Thresholds are plotted for PDD in Fig. 7(a), and a graphical representation of the R-LUT size is shown on the right. In Fig. 7(b) we plot the distribution of center symbols retained in the LUT for take the *η*_{1} = 20%. For full-size LUT, this distribution (not shown) is uniform with 1/16 of LUT entries for each symbol. Comparing Fig. 7(b) with Fig. 3(c), we find some similar trends, but also significant differences. Fig. 3(c) has different in-phase (I) error distributions for each quadrature (Q); the two 1D LUTs are incapable of seeing this effect. However, Fig. 3(c) is an average effect (over all patterns), while the nonuniform distribution in Fig. 7(b) indicates excursions (or outliers) from that average. The in-phase and quadrature R-LUTs exhibit similar retained patterns when reducing LUT size.

We observed in Fig. 7(b) non-uniform distributions of middle symbols in retained patterns for the R-LUT. Of further interest is the distribution of other symbols (not only middle) in these retained patterns. The transition from the first-to-middle-to-third symbol may also be significant in determining which patterns are retained in the LUT. In Fig. 8, we select two groups of patterns with high retention in the R-LU: a corner symbol (middle symbol of −15) and a central symbol (middle symbol of −5).

The distribution of the first and third symbols are reported in Fig. 8(a)–8(b) for center symbol −15 and −5, respectively. As can be seen in Fig. 7(b), almost all patterns with middle symbol of −15 are retained by 20% R-LUT. Thus, we expect a uniform distribution of the first and third symbols in these patterns; this is verified in Fig. 8(a). Fewer patterns with middle symbol of −5 are retained in the 20% R-LUT than middle symbol −15. For middle symbol −5, we observe a bowl shaped distribution in Fig. 7(b). Hence patterns that transition from one corner (−15) to the central area (−5) and then again to the corner have the highest density transition. Central symbols in first and third position are less likely, i.e., the bottom of the bowl is around −5.

In Fig. 9(a), we plot PDD (not absolute PDD) for the full-size LUT. We sort the pattern indices to group together PDD for indexes with common center symbols (amplitudes from −15 to 15). Contrast this with Fig. 7(a), with unsorted pattern indexes. For each center symbol, we also find the average PDD, and plot it with round markers in Fig. 9(a). The average is taken over 256, the same for all symbols as all patterns containing each symbol is included in the full-size LUT.

The plot for the 20% R-LUT DPD in Fig. 9(b) has averages calculated over the number of patterns retained for each center symbol, a number that varies per Fig. 7(b). The width of the pattern groupings (alternating grey and white) in Fig. 9(b) is proportional to the number of patterns retained, i.e., the height in Fig. 7(b). We can see that the R-LUT has average error higher for each pattern grouping as only the most vulnerable (high PDD) are retained. The distribution is highly non-uniform; more than 90% percent of patterns with center symbol of −15 are selected, and symbols 5 and 15 are virtually excluded. However, a considerable percentage of inner symbols (*e.g.*, −7 to 1) is retained. This behavior indicates that to reduce the LUT size, we need to prune the LUT using patterning effects, and not prune it on average constellations alone as in [19].

#### 6.2. R-LUT performance evaluation

Performance at 2 dBm optical received power for in-phase R-LUT with *n* = 3 at 20 Gbaud 256QAM and *n*=5 at 40 Gbaud 128QAM are presented in Fig. 10(a)–10(b), respectively. BER curves (× markers) are plotted versus R-LUT size in percentage. The absolute size of the R-LUT can be read on the y-axis in the curve with square markers. Linear pre-equalization alone is represented at R-LUT 0%.

Even R-LUTs as small as 20% can move the BER below the 20% overhead FEC threshold in both cases. Expanding the R-LUT size leads to improved performance, until saturation at around 60% to 70%; after this point larger LUTs do not lead to improved performance. For 20 Gbaud 256QAM, using almost half of the LUT table (2457 patterns with *n* = 3) offers nearly the same performance as full-size LUT (4096 patterns). The 40 Gbaud system requires *n* = 5, leading to larger absolute LUT size, even though the constellation is smaller than the 20 Gbaud system. Even in this system with longer memory, the LUT size can be reduced from 9.25e4 to 6.47e4 with virtually no compromise in performance.

## 7. Validating LUT performance on PRBS data sets

In all preceding results, we populated the LUT tables using PRBS24, and measured BER on transmissions of PRBS24 data sets. In this section, we verify that PRBS order does not impact our results. Although larger LUT sizes (e.g., *n* ≥ 5) offer lower BER, we choose *n* = 3 for our validation. A LUT for random data would need to be very large for *n* = 5, and none of our experimentally available PRBS sequences would cover all LUT entries. In selecting LUT-3 each PRBS tested will call on the entire set of LUT entries. We take the LUT-3 tables populated using PRBS24, but this time use test sequences different from the training sequence and confirm that the BER improvement remains.

Our test sequences are taken from different PRBS lengths: PRBS14 at length (2^{14} − 1) through PRBS26 at length (2^{26} − 1). We treat the cases R-LUT-3(60%) and full-size LUT, *e.g.*, R-LUT(100%) for 256QAM at 20 Gbaud. We run using the LUTs trained on PRBS24 from the previous experiments, no matter what test sequence we use. In other words, our previous results used the same data for BER calculation as for LUT training. Here the BER is calculated over different data sets.

The BER was measured at 2 dBm received optical power. We examine the use of linear equalization alone, as well as linear equalization combined with a full-size LUT or a 60% R-LUT. In Fig. 11, the BER is plotted as we sweep the PRBS length of data transmitted. In all cases, BER results are unaffected by the choice of PRBS length for testing, both longer and shorter than the PRBS used for training.

We do not have an accurate assessment of true system memory, as it is embedded in many other effects. However, it is reasonable to assume it is greater than three symbols (the length of the LUT tested in this section). The LUT-3 entries obtained from PRBS24 (containing all length 3 sequences) may be overtrained on some subsets of longer data sequences. Assume for the sake of discussion that system memory is five symbols. As PRBS24 contains a small subset of length 5 sequences, and PRBS26 contains more, the uptick in BER from PRBS24 to PRBS26 visible in Fig. 11 might be explained by overtraining. However, the BER variation seen in PRBS26 is not high (it still shows good improvement and remains at a similar distance from the FEC threshold), so we cannot be certain that it is statistically relevant and not the result of some variation in the experimental setting. That the lowest error occurs for the PRBS24 used in training is, of course, not unexpected.

## 8. Conclusion

We experimentally reported 20 Gbaud 256QAM and 40 Gbaud 128QAM generation with a SiP IQ modulator. A combination of linear pre-equalization and two 1D LUTs was able to move the BER below the the 20% FEC-threshold in back-to-back demonstrations (unattainable with linear pre-equalization alone). The LUT training results show the presence of nonlinear memory effects, and motivated our method of seeking LUT size reduction. The LUT solution, especially in the reduced complexity 1D version (vs. 2D) and sparsely populated entries (R-LUT), is a reasonable complexity solution for nonlinear predistortion. The reduced size LUTs offered similar performance at half the original LUT size for 20 Gbaud 256QAM with *n* = 3, and 40 Gbaud 128QAM with *n* = 5. The generality of LUT-based predistortion results was experimentally demonstrated on different data sets (varying PRBS orders).

## Appendix

## A. Predistortion transmission by means of full-size LUT

We present constellations of the predistorted QAM data at different LUT memory depths (*i.e.*, *n* = 3, 5, 7) for 20 Gbaud 256QAM and 40 Gbaud 128QAM transmission in Fig. 12 and Fig. 13, respectively. When running the transmission system at 20 Gbaud we are operating close to (but within) the 3 dB bandwidth of the modulator (inset of Fig. 2). We push the baud rate to 40 Gbaud to test our LUT solution in a region where the system is highly bandlimited and nonlinear (the modulator frequency response varies with signal level). We must scale down the modulation format to 128QAM at this baud rate.

At 40 Gbaud, the system under test suffers from higher ISI distortion in addition to the nonlinear distortion. Comparing, for example, Fig. 12(a) and Fig. 13(a), the nonlinear compensation seems more prominent for 40 Gbaud 128QAM, despite the fact we have greater absolute error in 20 Gbaud 256QAM (see Fig. 3(c). Due to the greater ISI effect, predistorted constellation points are more stretched, although the modulation format is smaller (128QAM vs. 256 QAM). In addition, Fig. 13(a) shows an interesting artifact of using 1D LUT rather than 2D. Since, the distortion happens for in-phase and quadrature components, separately, the predistorted constellations are stretched in one dimension rather than an omnidirectional cloud. This effect is less prominent in other plots must likely due to the presence of noise in the LUT table settings (see discussion in section 5.2)

The received constellations of 20 Gbaud 256QAM and 40 Gbaud 128QAM are given in Fig. 14. In this case, at the transmitter we had combined the linear pre-equalizer with the full-size LUT-7. In Fig. 14 we have achieved a more tightly packed constellation than that in Fig. 3(a)–3(b) where the linear equalization alone was used at the transmitter. Taking into account the pattern dependent nature of the nonlinear distortion is clearly effective.

## B. Electrical spectrum of the transmitted data

We present the electrical spectrum of 256QAM at 20 Gbaud to examine the effect of various pre-equalization steps. We consider three filters used at the transmitter side including, raised cosine (RC) pulse shaping, MMSE pre-distortion, and LUT-3 pre-distortion (nonlinear filter). Pulse shaping is used to confine data to the passband of the DAC, and MMSE is applied to equalize that passband. These linear compensations are a necessary first step. We see in Fig. 15(a) and 15(b) the effects of shaping and equalization. The final Fig. 15(c) after nonlinear pre-compensation shows no visible change to the spectrum. The LUT attacks the nonlinear ISI.

## Funding

Natural Sciences and Engineering Research Council of Canada (NSERC) (CRDPJ 486716-15); Huawei Canada.

## Acknowledgments

The authors thank Dr. Zhuhong Zhang and his team at Huawei Canada for many useful discussion, and CMC Microsystems for fabrication of the MZM used in our experiments.

## References

**1. **Q. Cheng, M. Bahadori, M. Glick, S. Rumley, and K. Bergman, “Recent advances in optical technologies for data centers: a review,” Optica **5**(11), 1354–1370 (2018) [CrossRef]

**2. **J. Wang and Y. Long, “On-chip silicon photonic signaling and processing: a review,” Sci. Bulletin **63**(19), 1267–1310 (2018). [CrossRef]

**3. **P. Dong, L. Chen, C. Xie, L. L. Buhl, and Y. Chen, “50-Gb/s silicon quadrature phase-shift keying modulator,” Opt. Express **20**(19), 21181–21186 (2012). [CrossRef] [PubMed]

**4. **J. Lin, H. Sepehrian, L. A. Rusch, and W. Shi, “Single-carrier 72 GBaud 32QAM and 84 GBaud 16QAM transmission using a SiP IQ modulator with joint digital-optical pre-compensation,” Opt. Express **27**, 5610–5619 (2019). [CrossRef] [PubMed]

**5. **S. Zhalehpour, J. Lin, M. Guo, H. Sepehrian, Z. Zhang, L. A. Rusch, and W. Shi, “All-Silicon IQ Modulator for 100 GBaud 32QAM Transmissions,” in in *Optical Fiber Communication Conference Postdeadline Papers*, OSA Technical Digest (online) (Optical Society of America, 2019), paper Th4A.5.

**6. **A. Napoli, P. W. Berenguer, T. Rahman, G. Khanna, M. M. Mezghanni, L. Gardian, E. Riccardi, A. C. Piat, S. Calabrò, S. Dris, A. Richter, J. K. Fischer, B. Sommerkorn-Krombholz, and B. Spinnler, “Digital pre-compensation techniques enabling high-capacity bandwidth variable transponders,” Opt. Commun. **409**, 52–65, (2018). [CrossRef]

**7. **A. Khilo, C. M. Sorace, and F. X. Kärtner, “Broadband linearized silicon modulator,” Opt. Express **19**(5), 4485–4500 (2011). [CrossRef] [PubMed]

**8. **G. Karam and H. Sari, “A data predistortion technique with memory for QAM radio systems,” IEEE Trans. Commun. **39**(2), 336–344 (1991). [CrossRef]

**9. **P. J. Winzer, “High-Spectral-Efficiency Optical Modulation Formats,” J. Lightw. Technol. **30**(24), 3824–3835 (2012). [CrossRef]

**10. **G. Khanna, B. Spinnler, S. Calabrò, E. De Man, U. Feiste, T. Drenski, and N. Hanik, “A Memory Polynomial Based Digital Pre-Distorter for High Power Transmitter Components,” in *Optical Fiber Communication Conference*, OSA Technical Digest (online) (Optical Society of America, 2017), paper M2C.4. [CrossRef]

**11. **R. Elschner, R. Emmerich, C. Schmidt-Langhorst, F. Frey, P. Berenguer, J. Fischer, H. Grießer, D. Rafique, J. Elbers, and C. Schubert, “Improving Achievable Information Rates of 64-GBd PDM-64QAM by Nonlinear Transmitter Predistortion,” in *Optical Fiber Communication Conference*, OSA Technical Digest (online) (Optical Society of America, 2018), paper M1C.2. [CrossRef]

**12. **L. Guan and A. Zhu, “Low-cost FPGA implementation of Volterra series-based digital predistorter for RF power amplifiers,” IEEE Trans. Microw. Theory Techn. **58**, 866–872 (2010). [CrossRef]

**13. **H. Faig, Y. Yoffe, and D. Sadot, “Dimensions-Reduced Volterra-Based Digital Pre-Distortion for Band-Limited Nonlinear Components,” in *Advanced Photonics (BGPP, IPR, NP, NOMA, Sensors, Networks, SPPCom, SOF)*, OSA Technical Digest (online) (Optical Society of America, 2018), paper SpM3G.2. [CrossRef]

**14. **S. Zhalehpour, J. Lin, H. Sepehrian, W. Shi, and L. Rusch, “Mitigating pattern dependent nonlinearity in SiP IQ-modulators via iterative learning control predistortion,” Opt. Express **26**, 27639–27649 (2018). [CrossRef] [PubMed]

**15. **S. Lange, S. Wolf, J. Lutz, L. Altenhain, R. Schmid, R. Kaiser, M. Schell, C. Koos, and S. Randel, “1-Tb/s Millimeter-Wave
Signal Wireless Delivery at D-Band,”
J. Lightw. Technol. **37**,
196–204
(2018).

**16. **X. Li, J. Yu, L. Zhao, K. Wang, C. Wang, M. Zhao, W. Zhou, and J. Xiao, “100 GBd Intensity Modulation and Direct Detection With an InP-Based Monolithic DFB Laser Mach–Zehnder Modulator,” J. Lightw. Technol. **36**(1), 97–102 (2018). [CrossRef]

**17. **J. Ke, Y. Gao, and J. Cartledge, “400 Gbit/s single-carrier and 1 Tbit/s three-carrier superchannel signals using dual polarization 16-QAM with look-up table correction and optical pulse shaping,” Opt. Express **22**(1), 71–84 (2014). [CrossRef] [PubMed]

**18. **S. Zhang, F. Yaman, Y. Huang, T. Inoue, K. Nakamura, E. Mateo, Y. Inada, T. Wang, and T. Ogata, “Trans-Pacific Transmission of Quad-Carrier 1Tb/s DP-8QAM Assisted by LUT-based MAP Algorithm,” in *Optical Fiber Communication Conference*, OSA Technical Digest (online) (Optical Society of America, 2015), paper W3G.3. [CrossRef]

**19. **J. Zhang, P. Gou, M. Kong, K. Fang, J. Xiao, Q. Zhang, X. Xin, and J. Yu, “PAM-8 IM/DD Transmission Based on Modified Lookup Table Nonlinear Predistortion,” IEEE Photon. J. **10**, 1–9 (2018). [CrossRef]

**20. **S. Zhalehpour, J. Lin, H. Sepehrian, W. Shi, and L. Rusch, “Experimental demonstration of reduced-size LUT predistortion for 256QAM SiP Transmitter,” in *Optical Fiber Communication Conference*, OSA Technical Digest (online) (Optical Society of America, 2019), paper Th1D.3.

**21. **P. Varahram, S. Mohammady, A. Borhanuddin Mohd, and N. Sulaiman, *Power efficiency in broadband wireless communications* (CRC, 2014). [CrossRef]

**22. **D. Patel, “Design, Analysis, and Performance of a Silicon Photonic Traveling Wave Mach-Zehnder Modulator,” Masters Diss. McGill University (2014).