In this paper, we proposed a class of large-girth QC-LDPC codes designed to maximize the girth property with code rates ranging from 0.5 to 0.8, which leads to well-structured parity-check matrix and generator matrix. Instead of implementing several FEC encoder and decoder engines in hardware, we design an efficient unified FPGA-based architecture enabling run-time reconfigurable capability. Apart from four principle LDPC codes being incorporated into a unified design, shortening is adopted to bridge the rate gap between principle codes. With our proposed unified LDPC engine, the signal-to-noise ratio (SNR) limits of −1 dB to 2.2 dB have been demonstrated at BER of 10−12 in additive white Gaussian noise (AWGN) channel by FPGA emulation. It is desirable for the application to both free-space optical (FSO) and fiber optics communications. Large code rate range is preferred to deal with various channel impairments. To further verify the proposed unified code engine for FSO applications, we tested the scheme through a spatial light modulator (SLM)-based FSO channel emulator. We showed that in medium atmospheric turbulence regime, a post-FEC BER below 10−8 can be achieved without any interleaver and adaptive optics.
© 2018 Optical Society of America under the terms of the OSA Open Access Publishing Agreement
Forward error correction (FEC) codes represent a key technology that has become standard in nearly all digital communication systems ; in particular the soft-decision low-density parity-check (LDPC) codes and turbo product codes, both being reported with channel capacity approaching performance [2,3]. With current emerging digital coherent technologies and efficient digital signal processing (DSP) techniques, a well-designed and optimized single LDPC code and convolutional LDPC code have been demonstrated to have very low error floors below the system’s target bit-error rate (BER) [4,5]. Meanwhile, low code rate of 0.7 LDPC code combined with high-level modulation formats are investigated since more and more bandwidth have been combined with FEC . Unlike fiber-optics communication system, free-space optical (FSO) communication channel is highly dependent on the weather conditions, such as the effects of fog, rain, wind, atmospheric gases that result in beam attenuation, which are usually not fixed or stable [7,8]. Thanks to the adaptive optics and efficient DSP schemes, FSO links have been demonstrated experimentally with spatial-mode multiplexing [9,10]. It is highly desirable to have a FEC engine supporting different code parameters that is capable of dealing with various FSO channel conditions.
Motivated by this, we first design four principle large-girth quasi-cyclic (QC) LDPC codes with code rate of 0.8, 0.7, 0.6, and 0.5, respectively, aiming to cover a wide range of signal-to-noise ratio with excellent performance. In this design, the higher rate codes are sub-codes of lower rate codes. Additionally, the shortening technique is employed such that code rates between principle LDPC codes can be achieved for fine-tuning purpose. Moreover, carefully designed principle LDPC codes based circulant matrices is investigated for both parity-check matrix and generator matrix. The structured property leads to super-efficient encoder and decoder hardware design. As a result, we proposed a hardware design which is capable of run-time configuration of code parameters and demonstrated their excellent performance over both binary input additive white Gaussian noise (BI-AWGN) channel and spatial light modulator (SLM)-based FSO channel emulator. This unified adaptive LDPC code design is also suitable to deal with uncompensated time-varying polarization-mode dispersion (PMD) and fiber nonlinear effects in fiber-optics channels.
The contribution of this paper can be summarized as follows. Firstly, we present our design scheme of a set of LDPC codes enabling an efficient unified encoder and decoder hardware implementation. In addition, to the best our knowledge, this is the first work that exploit LDPC coding with code rates ranging from 0.45 to 0.8 that offers run-time reconfigurable capabilities and their high performance is demonstrated with one LDPC engine.
The paper is organized as follows. In Section 2, we present the construction method of the proposed adaptive LDPC coding. In Section 3, we provide the detailed discussion on the implementation of the proposed LDPC coding scheme. The extensively emulation results, discussion and analysis are presented in Section 4. In Section 5, we present the experimental demonstration of the proposed code scheme. Finally, conclusions are drawn in Section 6.
2. Adaptive LDPC code design
In this section, we provide an in-depth, side-by-side description of the proposed wide range adaptive LDPC coding, consisting of both the construction of parity-check matrix and generator matrix.
2.1 Adaptive parity check matrix construction
In this paper, we choose a large girth QC-LDPC design due to the following advantages: (i) large minimum distance provides excellent waterfall performance: given the column weight, Tanner’s bound reveals the fact that minimum distance of an LDPC code is exponentially proportional to the value of girth ; (ii) large girth also provides good error floor performance , and (iii) QC-LDPC structure leads to an efficient implementation. Following the guideline in , the parity-check matrix H of a (γ, ρ) regular binary QC-LDPC code can be represented by12]. The resulted code rate R of above regular LDPC codes is lower-bounded by , the length of codeword bits, number of check nodes and length of information bits are ρb, γb and ρb−r, respectively, where r denotes the rank of parity-check matrix. It is worth noting that the circulant permutation-based design usually leads to non-full rank parity check matrix, which will affect the generator matrix design. Additionally, the construction method described above will result in a codeword length increase as the girth requirement increases.
Speaking of the construction of a class of rate adaptive QC-LDPC codes, one of most straightforward way is to design a set of parity check matrices with different code parameters, such as code rate, codeword length, then instantiate several FEC cores in hardware to deal with various channel conditions. However, this is not an efficient solution since only small portion of the hardware is active at specific time instant and the total logic usage is proportional to the number of FEC cores. Alternatively, inspired by shortening and puncturing techniques adopted in Reed Solomon codes, shortening based LDPC codes are observed with early error floor . As shortening technique is essentially eliminating several block columns from the parity-check matrix, the structure of shortened code is equivalent to the original code, specifically, the girth does not increase if several block-columns have been eliminated. To solve this problem, given column weight γ, we first develop a large-girth small row weight ρs, then we use the designed parity-check matrix and extend it to larger row weight ρl. For example, as shown in Fig. 1(a), we first design a girth-10 (3, 10) QC-LDPC code with code rate of 0.7 and append 5 column blocks to generate a girth-8 (3, 15) QC-LDPC code with code rate of 0.8. We denote the code with code rate of 0.8 as and code rate of 0.7 as . In this case, different from shortening techniques, the both lower and higher rate LDPC code are the best in terms of girth maximization. Meanwhile, the subset property of the proposed design will also benefit in generator matrix, which will be shown in Section 2.2.
Column weight of three is widely adopted in optical communication systems due to its low implementation complexity , where code rate is usually larger than 0.7. However, code rate as low as 0.5 is preferred in free-space optical communication system to deal with time-varying turbulence channel. Our emulation result reveals that column weight three is not feasible for low rates, e.g. 0.6 and 0.5, without error floor phenomenon. As a result, following the same procedure discussed above, given a column weight of 4, we first design a girth-10 (4, 8) QC-LDPC code with code rate of 0.5 and append 2 column blocks to generate a girth-8 (4, 10) QC-LDPC code with code rate of 0.6. As shown in Fig. 1(b), we denote the code with code rate of 0.6 as and code rate of 0.5 as . Our goal is to design an H-matrix such that portion of it will be composed with all four codes. Unfortunately, the best we can find by simulation is two Hqc matrices, one for column weight three codes and the other for column weight four codes, while keeping the codeword length reasonably low.
2.2 Adaptive generator matrix construction
Given a parity check matrix, one of the straightforward methods to obtain a systematic form generator matrix is to perform the Gaussian elimination. After Gaussian elimination process, Hqc will have a form [PT I], thus the generator matrix Gqc can be written in systematic form [I P]. During the Gauss-Jordan elimination, several columns need to be swapped, which is equivalent to changing the positions of corresponding variable nodes. Therefore, we need to take care of those variable nodes after the encoding process. Meanwhile, some subsequent rows are linearly dependent with respect to the preceding rows, which reduces the code rate. The systematic form of generator matrix brings advantage on the latency of the encoding process. However, the non-structured generalized parity matrix P prevents from the efficient hardware implementation due to large memory consumption in storing the entire parity-check matrix. Therefore, we consider the systematic-circulant (SC) form of the generator matrix instead .
Since the parity-check matrix construction method based on permutation matrices usually does not has the full rank, we consider the case for which r < γb in this paper. For more details on how the SC form of generator matrix can be obtained from a QC form of the parity-check matrix, we refer an interested reader to . In this case, the desired generator matrix has the size of (ρb−r) × ρb and will have a form Gqc = [G | Q]T, where both G and Q are circulant matrices with size of (ρ−γ)b × ρb and lb × ρb, where l is the number of independent rows in Hqc. As shown in Fig. 2, we obtain the generate matrices for both the pre-constructed four LDPC codes, since each submatrix in G is circulant matrix, it is only required to store the first row of each matrices. Thus, the benefits of using SC formed generator matrix is an approximately b times reduction in memory. The memory usage is further reduced since two codes of column weight there and column weight four share the same generator matrix.
3. Unified FPGA-based architecture
In this section, we will discuss the emulation platform of binary QC-LDPC codes. As shown in Fig. 3, the emulation platform consists of pseudorandom binary sequence generator, LDPC encoder, additive white Gaussian noise generator, log-likelihood ratio (LLR) calculator, LDPC decoder, error counter and virtual input/output (VIO) module.
3.1 Unified LDPC encoder architecture
As shown in Fig. 4(a), the overall adaptive LDPC encoder consists of memory blocks storing the generator matrix, a set of shift-register-adder-accumulator (SRAA) processors, a set of parallel-to-serial converters, a mux unit, and a first-in first-out (FIFO) memory assembling the codeword. There are two major concerns in our design. The first is the storage for generator matrix. By using the SC form of generator matrix, as described in Section 2.2, we can store the first row of each circulant instead of storing the entire parity matrix. The memory required to store the generator matrix is of size of while the matrix with size of . To be more specific, the linear dependent rows of is two and three for column weight three and four codes, respectively. Given block size , the memory size used to store the entire matrix can be calculated as for and , while the storage of is for and . It is worth noting that the proposed design allows us to design the generator matrix such that is the sub-code of , this property is also applied to and . Another concern is the complexity of SRAA, which is shown in Fig. 4(b). At the beginning, with three or four SRAA employed in parallel, the first block row of parity matrix is loaded to the linear-feedback shift register (LFSR) and the first portion of matrix multiplication is completed after clock cycles. Then the second block row of parity matrix is loaded to LFSR and the accumulate sum of matrix multiplication is completed after another clock cycles. The above process is repeated until the entire information sequence has been shifted into the encoder. With SRAAs, the parallel parity sequence can be completed in clock cycles. Each SRAA requires one LFSR of size , one size of AND gate, one size of XOR gate, and one size of register. There are four SRAA processing units instantiated in our adaptive design, the first three SRAAs are shared by column weight three and column weight four LDPC codes while the fourth SRAA will only power up when the encoder is configured to encode and .
3.2 Unified LDPC decoder architecture
We adopted the layered scaled min-sum algorithm for the decoder of our proposed adaptive LDPC codes . Let and denote the iteration index and layer index, and represent the LLR from the channel and a posteriori probability (APP) at iteration index of and layer index of . The overall architecture of the adaptive LDPC decoder is depicted in Fig. 5. There are two types of memories that inferred as block memory resources (BRAMs) to store and , denoted as LLR mem and APP mem in Fig. 5. At each layered iteration, and are first read from the memories and feed to variable node unit (VNU), with one output passed to FIFO unit as output bits and the other output of VNU is forwarded to the check node unit (CNU), which performs scaled min-sum computation. The output of CNU will write back the APP memory at that layer and the most up-to-dated information will be used the next layer/iteration. Similar to the encoder architecture, there are four-layer APP memories instantiated in the design in order to accommodate four codes. Among those, the first three are shared by column weight three and four codes. Additionally, two parity check matrices are stored to address the memories.
3.3 Additional details of the emulator
Besides the encoder and decoder implementations discussed above, there are other relevant components in the emulator. Firstly, we use the polynomial as the generator polynomial of the pseudorandom binary sequence generator (PRBS) module to generate a serial binary sequence. Secondly, a Box-Muller based Gaussian noise generator is employed to model the additive white Gaussian noise (AWGN) channel. The output of the noise generator is quantized into 16 bits with 11 fractional bits. The accuracy of the noise generator is verified by estimating the bit error rate (BER) performance versus SNR for binary phase-shift keying (BPSK) transmission. The VIO module, which is an embedded debugging tool of Vivado Design Suite, is used to record the number of uncoded errors, number of transmitted codeword, and number of decoded errors by comparing the transmitted data with the decoded data. Meanwhile, VIO is also used to configure the code selection and SNR.
4. Emulation results and analysis
Based on the discussion in previous sections, we first construct four QC-LDPC codes with rates of 0.8, 0.7, 0.6, and 0.5 aiming to provide excellent performance for each specific code rate by maximizing the girth, which can be used as coarse tuning. Then the fine tuning is achieved via shortening that can bridge the SNR gap between the four codes. Meanwhile, we verify the BER performance of the proposed run-time reconfigurable adaptive LDPC coding by a field-programmable gate array (FPGA) platform. Unlike the reconfigurable capacity offers in Xilinx Reed Solomon core, in which rebuild is mandatory for different code parameters, our proposed design is capable to do the run-time reconfiguration of the code parameters, including code rate ranging from 0.5 to 0.8 and shortening sequence length. The emulator is implemented in a Kintex Ultrascale XCKU040-2FFVA1156E FPGA device from Xilinx. Such devices provide the best price/performance/watt at 20nm and they are suitable for DSP-intensive application for the next generation FSO and fiber-optics communication systems. The clock frequency is set to 200 MHz to accommodate the de-multiplexed data from front to end as well as meeting the timing requirements of the design. With this running clock frequency, it can be derived that the decoding process for the QC-LDPC (37005, 29606) code with 30 layered iterations will take clock cycles, where is the latency of each processing unit (expressed in clock cycles), and is the maximum number of iterations. Hence, this decoder will have a throughput of 100Mb/s without any pipelining and early termination criterion. Regarding the LDPC encoder, a throughput of 200 Mb/s can be achieved with only 8 clock cycles latency.
With regards to the logic utilization, as summarized in Table 1, the encoder utilizes large number of flip-flops compared to other processing units because of the need to store the generator matrix as well as the flip-flops used in SRAA units. For the LDPC decoder, the logic usage is relatively low as no pipelining is involved. However, large amount of block RAM resources is consumed storing the initial channel LLR information and check-to-variable messages. For comparison, we provide the utilization of each separate design as well as the proposed unified design. We can easily observe that the unified design occupies only slightly more than the largest one among the four separate codes, since it requires more complex controller design and additional storage of parity-check matrix and generation matrix. Compared with having four individual codes in a single design, there is a significant utilization reduction with the unified design. Overall, the resource utilization is reasonably low for the proposed unified design. However, either creating several replicas of this unified design or increasing more and more pipeline that enabling multi-Giga bit throughput required be further investigating and justifying.
The FPGA-based emulation was conducted over binary (BI)-AWGN channel and 8 bits precision of LLRs is used in binary LDPC decoder. The BER vs. SNR performance is depicted in Fig. 6 with 36-layered iterations. The result shows an excellent water fall performance and error floor performance of the proposed principle LDPC codes. The reason behind the good error phenomenon can be explained as the size and the number of trapping sets increase as the girth increases. Additionally, for illustrating purpose, shortened from the principle LDPC codes, the performance of shortened LDPC codes are verified as well to bridge the gap between the principles LDPC codes. From Fig. 6 and Table. 2, the SNR-limits of 2.25 dB, 1.73 dB, 1.14 dB, 0.63 dB, 0.2 dB, −0.24 dB, −0.65 dB and −1.1 dB can be achieved at BER of 10−12 for code rate at 0.8, 0.75, 0.7, 0.65, 0.6, 0.55, 0.5, and 0.45, respectively, by using our run-time reconfigurable LDPC engine. Therefore, we believe that the proposed run-time reconfigurable large girth QC-LDPC code would be one of the promising solutions for the next generation FSO and fiber-optics communication systems.
5. FSO experimental demonstration
In this section, we will discuss the experimental setup for testing the proposed unified LDPC code engine. Figure 7 shows the experimental setup for the transmission system. For simplicity, we choose direct detection scheme with on-off keying (OOK) modulation in this demonstration.
5.1 Free-space optical channel emulator
In our experiment, we emulate the atmospheric turbulence channel with three reflective phase-only spatial light modulators (SLMs), on which randomly generated azimuthal phase patterns yielding Andrews’ spectrum are recorded. The SLMs we use offer 1920 × 1080 pixels’ resolution within a 15.36 mm × 8.64 mm active area. In detail, the time-varying atmospheric turbulence is emulated by continuously upgrading the random phase patterns at 20 Hz frame rate. This turbulence model is built on the Rytov variance  of , with inner scale set to be 1 mm, outer scale of 1 m, path length L = 10 km, and turbulence strength . The real emulator link path is around 1 m. By comparing the statistic distribution with theoretical results, the emulator provided an FSO channel in medium turbulence regime .
5.2 Transmission system setup
In the transmission system, at the transmitter side, a tunable continuous wave (CW) laser with 10 kHz linewidth is sent to the Mach-Zehnder modulator at 1550 nm. OOK modulation is adopted in our system. The modulated signal is boosted by an erbium-doped fiber amplifier (EDFA) with 6 dB noise figure. Then the signal is coupled with an amplified spontaneous emission (ASE) noise source, in which we can tune the noise level for testing the rate adaptive coding performance. The noisy signal is then passed through the FSO emulator described above. At the receiver side, the received beam is coupled into a single mode fiber, which is then filtered by an optical tunable filter to select the target wavelength at 1550nm. A photodetector (PD) with transimpedance amplifier converts the optical signal to the baseband. An ADC/DAC FPGA mezzanine card (FMC) module from Analog Devices is used to transmit and receive the electrical RF signals. The AD-FMCDAQ2-EBZ module is comprised of a 14-bit, 1.0 GSa/s, JESD204B ADC, and a 16-bit, 2.8 GSa/s, JESD204B DAC. In the FPGA, the transmitted signal is up sampled by four times. We ensemble 4 parallel signals through the JESD204B interface and transmit at 1.0 GSa/s through the DAC. At the receiver side, ADC samples the signal at 1.0 GSa/s, and then the digital signal is down sampled 4 times by the sampler before being sent to the decoder. Both encoder and decoder operate on 250 MHz clock frequency.
5.3 Experimental results and discussion
The BER performance of the unified code engine is shown in Fig. 8. With no bit-interleaver and adaptive optics in the system, we observed a post-FEC BER below 10−8. When it close to 10−9, we found that, due to the power fluctuation caused by the turbulence channel, burst errors caused by channel fading produce error floor around 10−9. To improve this situation, one can implement a bit-interleaver among several codewords, which will spread burst errors to different bit positions. Furthermore, from the channel equalization point of view, adaptive optics can be used to mitigate the turbulence strength, which lead to a more stable channel. We will further investigate the unified code engine in an outdoor free-space testbed with implementing the technologies mentioned above.
6. Concluding remarks
In this paper, we first propose a design methodology of a class of large-girth QC-LDPC and exploited from well-structured parity check matrix to generator matrix design. Aided by the properties of circulant matrices, we provide an in-depth, side-by-side discussion on the efficient design of run-time reconfigurable FPGA-based unified LDPC code engine. Unlike the existing FEC cores that need to be rebuilt for different code parameters, our single unified LDPC code engine is capable of run-time configuration of code rates from 0.45 to 0.8, and the excellent performances have been demonstrated down to BER of 10−12 over AWGN channel. The demonstrated SNR-limits that cover from −1 dB to 2.2 dB provide super flexibility in order to accommodate for different turbulence channel conditions. Finally, we demonstrate the proposed scheme with FSO emulator for rate adaptation capability. The post-FEC BER performance can further improved by different adaptive optics and DSP technologies. The proposed adaptive unified LDPC core is also applicable in fiber-optics communications to deal with imperfectly compensated time-varying PMD and fiber nonlinear effects.
Office of Naval Research (ONR) MURI program (N00014-13-1-0627).
1. ITU-T G. 975. 1, Forward error correction for high bit-rate DWDM submarine system, 2004.
2. F. Paludi, D. A. Morero, T. Goette, M. Schnidrig, F. Ramos, and M. R. Hueda, “Low-complexity turbo product code for high-speed fiber-optics systems based on expurgated BCH codes,” in ISCAS, 429–432 (2016).
3. A. Leven, V. Aref, J. Cho, D. Suikat, D. Rosener, and A. Leven, “Spatially coupled soft-decision error correction for future lightwave systems,” J. Lightwave Technol. 33(5), 1109–1116 (2015). [CrossRef]
4. D. Chang, F. Yu, Z. Xiao, Y. Li, N. Stojanovic, C. Xie, X. Shi, X. Xu, and Q. Xiong, “FPGA verification of a single QC-LDPC code for 100 Gb/s optical systems without error floor down to BER of 10−15,” in OFC/NFOEC (2011), paper OTuN2.
5. L. Schmalen, D. Suikat, V. Aref, and D. Rosener, “On the design of capacity-approaching unit-memory spastically coupled LDPC codes for optical communications,” in ECOC (2016), pp. 1–3.
6. Z. Zhang, C. Li, J. Chen, T. Ding, Y. Wang, H. Xiang, Z. Xiao, L. Li, M. Si, and X. Cui, “Coherent transceiver operating at 61-Gbaud/s,” Opt. Express 23(15), 18988–18995 (2015). [CrossRef] [PubMed]
8. J. Renaudier, R. Rios-Muller, P. Tran, L. Schemalen, and G. Charlet, “Spectrally efficient 1-Tb/s transceivers for long-haul optical systems,” J. Lightwave Technol. 33(7), 1452–1458 (2015). [CrossRef]
9. N. Zhao, X. Li, G. Li, and J. M. Kahn, “Capacity limits of spatially multiplexed free-space communication,” Nat. Photonics 9(12), 822–826 (2015). [CrossRef]
10. Y. Ren, H. Huang, G. Xie, N. Ahmed, Y. Yan, B. I. Erkmen, N. Chandrasekaran, M. P. J. Lavery, N. K. Steinhoff, M. Tur, S. Dolinar, M. Neifeld, M. J. Padgett, R. W. Boyd, J. H. Shapiro, and A. E. Willner, “Atmospheric turbulence effects on the performance of a free space optical link employing orbital angular momentum multiplexing,” Opt. Lett. 38(20), 4062–4065 (2013). [CrossRef] [PubMed]
11. R. M. Tanner, “A recursive approach to low complexity codes,” IEEE Trans. Inf. Theory 27(9), 533–547 (1981). [CrossRef]
12. M. P. C. Fossorier, “Quasi-cyclic low-density parity-check codes from circulant permutation matrices,” IEEE Trans. Inf. Theory 50(8), 1788–1793 (2004). [CrossRef]
13. Y. Zhang and I. B. Djordjevic, “Staircase rate-adaptive LDPC-coded modulation for high-speed intelligent optical transmission,” in OFC/NFOEC, paper M3A.6 (2014).
14. Z. Li, L. Chen, L. Zeng, S. Lin, and W. H. Fong, “Efficient encoding of quasi-cyclic low-density parity-check codes,” IEEE Trans. Commun. 54(1), 71–81 (2006). [CrossRef]
15. D. Zou and I. B. Djordjevic, “FPGA-based rate-adaptive LDPC-coded modulation for the next generation of optical communication systems,” Opt. Express 24(18), 21159–21166 (2016). [CrossRef] [PubMed]
16. C. Andrews, R. L. Phillips, and C. Y. Hopen, Laser beam scintillation with applications. (SPIE, 2001).