# A 32-Gb/s PAM-4 Quarter-Rate Clock and Data Recovery Circuit with an Input Slew-rate Tolerant Selective Transition Detector

Dae-Hyun Kwon, Minkyu Kim, Sung-Geun Kim, and Woo-Young Choi

Abstract—We present a 32-Gb/s PAM-4 quarter-rate Clock and Data Recovery (CDR) circuit having a newly proposed Selective Transition Detector (STD). The STD allows phase detection of PAM-4 data in a simple manner by eliminating middle transition and majority voting with simple logic gates. In addition, using the edge-rotating technique with quarter-rate CDR operation, our CDR achieves power consumption and chip area reduction. A prototype 32-Gb/s quarter-rate PAM-4 CDR circuit is realized with 28-nm CMOS technology. The CDR circuit consumes 32 mW with 1.2-V supply and the recovered clock signal has 0.0136-UI rms jitter.

Index Terms—Bang-bang phase detector, Clock and data recovery (CDR), high speed serial link, multiphase, PAM-4 receiver

## I. INTRODUCTION

With the required amount of data transmission for many applications continuously increasing, the use of multiple data levels has become an efficient solution for increasing transmission throughput without increasing the clock frequency. In particular, PAM-4 signaling is now widely considered for many electrical and optical wireline applications due to its enhanced spectral efficiency [1], [2].

However, with PAM-4 signaling, the clock and data recovery (CDR) circuit becomes complicated since the phase detector should be able to determine correct phase information from various transitions among multiple data levels [3], [4].

For the bang-bang phase detector (BBPD) operation on PAM-4 data, received data are first sampled with clock signals  $(CK_{D0}, CK_{D1})$  for data sampling and  $CK_{E}$  for edge sampling) and compared with three different reference voltages  $(V_H, V_M, V_L)$ . The comparator outputs are then processed with three different pairs of XOR gates for producing  $UP_H/DN_H$ ,  $UP_M/DN_M$ , and  $UP_L/DN_L$  signals for up/down information for each level as shown in Fig. 1(a). They subsequently go through additional processing for the middle transition  $(00 \Leftrightarrow 10 \text{ or } 01 \Leftrightarrow 11)$  elimination and majority voting before final



Fig.1. (a) Conventional BBPD for PAM-4 input signals (b) CDR block diagram having middle transition eliminator and majority voter for multiple UP/DN signals (c) non-uniform jitter distribution according to input slew-rate.

UP and DN signals are produced, as shown in Fig. 1(b). The middle transition causes non-uniform jitter distribution [3] as graphically shown in Fig. 1(c). In particular, the amount of non-uniform jitter depends on the input data slew-rate. PAM-4 transmitters often employ the pre-emphasis technique with the control of the current ratio [1], [5] or output impedance [6], for enhancing transmission bandwidth and/or distance, resulting in slew-rate changes. With such PAM-4 transmitters, optimal design of CDR would be very difficult.

For middle transition elimination, logic gates are used for PAM-4 data gray-coded in the transmitter [3]. The logic gates

Dae-Hyun Kwon, Minkyu Kim, Sung-Geun Kim and Woo-Young Choi are with the High-Speed Circuits & Systems Lab., Department of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea (e-mail: wchoi@yonsei.ac.kr). This work was supported by Samsung Electronics, Materials and Parts Technology R&D Program funded by the Korean Ministry of Trade, Industry & Energy (Project No. 10065666), and the Graduate school of YONSEI University Research Scholarship Grants. Also, authors are thankful to IC Design Education Center (IDEC) for EDA tool support.



Fig. 2. UP/DN numbers for PAM-4 Transitions.

also perform majority voting so that the number of outputs from comparators having different reference voltages can be reduced. In [4], both middle transition elimination and majority voting are done in the digital domain. However, performing these operations in the digital filter can result in the latency problem.

We propose a novel PAM-4 phase detector structure in which a newly proposed selective transition detector (STD) simultaneously performs elimination of the middle transition and majority voting with simple logic gates. In addition, we use the rotating phase detection scheme [7], [8] for realizing a quarter-rate PAM-4 BBPD for achieving reduction both in power consumption and chip area.

This brief is organized as follows. In section II, the structure of our PAM-4 CDR including the novel STD and the edge-rotating technique is explained. Section III gives details of circuit implementation for key building blocks. Section IV discusses measurement results of our prototype chip. Section V gives the conclusion.

#### II. PAM-4 CDR ARCHITECTURE

## A. Selective Transition Detector

Fig. 2 shows the numbers of possible UP and DN signals



Fig. 3. (a) Output of 3-input XOR and OR for different transitions, (b) UP operation, and (c) DN operation.

produced for each of three different types of PAM-4 data transitions (minor, middle, and major) as a function of  $\Delta\theta$ representing the phase difference between edge sampling clock and data transition. For minor transitions corresponding to  $00 \Leftrightarrow 01$ ,  $01 \Leftrightarrow 10$ , or  $10 \Leftrightarrow 11$  transitions, only one of three UP signals ( $UP_H$ ,  $UP_M$ ,  $UP_L$ ) becomes high when  $\Delta\theta > 0$ and only one of DN signals (DNH, DNM, DNL) becomes high when  $\Delta\theta$  < 0, which are the same as the usual BBPD characteristics. For the middle transitions corresponding to 00  $\Leftrightarrow$  10 or 01  $\Leftrightarrow$  11, two among three UP signals become high when  $\Delta\theta > \theta_1$  and two among three DN signals become high when  $\Delta\theta < -\theta_1$ . However, there is one UP signal and one DN signal when  $-\theta_1 \le \Delta \theta \le \theta_1$ . For major transition that correspond to  $00 \Leftrightarrow 11$ , there are three UP signals when  $\Delta\theta > \theta_2$  and three DN signals when  $\Delta\theta < -\theta_2$ , but two UP signals and one DN signal when  $0 < \Delta\theta < \theta_2$ , and one UP signal and two DN signals when  $-\theta_2 < \Delta\theta < 0$ .

The middle transition information can be eliminated by taking three-input XOR operation, which produces high value when the odd number of inputs are high, on  $UP_H$ ,  $UP_M$ ,  $UP_L$ 

| UP <sub>XOR</sub> UP <sub>OR</sub> DN <sub>XOR</sub> DN <sub>OR</sub> | Transitions                                         | Status |
|-----------------------------------------------------------------------|-----------------------------------------------------|--------|
| 0 0 0 0                                                               | No transition                                       | Hold   |
| 0001                                                                  | # of DNs = 2<br>(Middle Transition)                 | Hold   |
| 0011                                                                  | # of DNs = 1 or 3<br>(Minor or Major<br>Transition) | DN     |
| 0100                                                                  | # of UPs = 2<br>(Middle Transition)                 | Hold   |
| 0111                                                                  | # of UPs = 2<br># of DNs = 1<br>(Major Transition)  | UP     |
| 1100                                                                  | # of UPs = 1 or 3<br>(Minor or Major<br>Transition) | UP     |
| 1101                                                                  | # of DNs = 2<br># of UPs = 1<br>(Major Transition)  | DN     |
| 1111                                                                  | # of DNs = 1<br># of UPs = 1<br>(Middle Transition) | Hold   |

Table I. PAM signal transition state according to  $UP_{XOR}UP_{OR}DN_{XOR}DN_{OR}$ .

producing  $UP_{XOR}$ , and on  $DN_H$ ,  $DN_M$ ,  $DN_L$  producing  $DN_{XOR}$ . As shown in Fig. 3(a), for minor transitions,  $UP_{XOR}$  and  $DN_{XOR}$  contain same characteristics as the conventional BBPD. However, for middle transitions, both of  $UP_{XOR}$  and  $DN_{XOR}$  are high only when  $-\theta_1 \le \Delta\theta < \theta_1$ , thus providing no UP or DN transition information, or 'Hold' status and achieving middle transition information elimination. However, for major transitions,  $UP_{XOR}$  and  $DN_{XOR}$  do not correspond to the desired characteristics. In  $0 < \Delta \theta < \theta_2$ , although the  $UP_{XOR}$  should be high, it becomes low, and  $DN_{XOR}$  becomes low when  $-\theta_2 < \Delta\theta$ < 0. This can be corrected with  $UP_{OR}$  and  $DN_{OR}$ , which are produced with 3-input OR operation on  $UP_{\rm H}$ ,  $UP_{\rm M}$ ,  $UP_{\rm L}$  and DN<sub>H</sub>, DN<sub>M</sub>, DN<sub>L</sub>, respectively. All PAM-4 signal transition information can be obtained with proper logic combinations of  $UP_{XOR}$ ,  $UP_{OR}$ ,  $DN_{XOR}$ , and  $DN_{OR}$  logic values, as described in Table I. Some transitions that do not occur are not included in Table I. Those combinations in Table I can be implemented with a Karnaugh map and can be expressed as,

$$UP = UP_{XOR} \cdot \overline{DN_{OR}} + UP_{OR} \cdot DN_{XOR} \quad (1)$$

$$DN = UP_{XOR} \cdot DN_{OR} + \overline{UP_{OR}} \cdot DN_{XOR} \quad (2)$$

As shown in Fig. 3(b) and (c), final UP/DN signals produced by STD show same characteristics as BBPD regardless of transition types. In our design, in order to minimize delay mismatches, identical logic gate structures composed of 2-input NAND gates are used for producing  $UP_{\rm XOR}$ ,  $UP_{\rm OR}$ ,  $DN_{\rm OR}$  and  $DN_{\rm XOR}$  signals.

In order to compare the operation of our STD with a conventional BBPD, behavior-level simulations are performed with PAM-4 data having 9 mUI rms jitter. The BBPD structure shown in Fig. 1(a) with and without STD is used for simulation with an ideal charge pump having  $50\mu$ A. Fig. 4 shows the simulation results when PAM-4 input data have three different input slew rates.  $T_T$  is the input data rise/ fall



Fig. 4. Ideal simulation of (a) conventional phase detector gain with variations of input slew-rate, and (b) phase detector having a STD with variations of input slew-rate.





Fig. 5. (a) Generation of rotational signal, and (b) timing operations of edge-rotating BBPD.

time, and  $T_{\rm D}$  represents one UI. As shown in Fig. 4, our STD produces the desired characteristics regardless of input slew rate variations, whereas the PD characteristics of the conventional structure show significant changes when input slew-rate changes.

## B. Edge-rotating technique

Although our STD can be used for any type of CDR, our CDR is implemented in the quarter rate so that the burden of buffering high-speed signals can be reduced and a simple ring-type VCO, which occupies much less chip area than LC VCO, can be used. Furthermore, in order to further reduce the complexity of the CDR, we use the edge-rotating technique [7] in which the locking point is determined with a single clock phase among sequentially rotating phases. Fig. 5 shows the generation of rotational signal, and the timing operation of CDR employing the edge-rotating technique. The dividing ratio of 16 is used in our design in order to make sure the rotation speed is larger than the CDR loop bandwidth and no CDR performance degradation is caused by the edge rotation [8].



Fig. 6. Block diagram of proposed PAM-4 CDR.



Fig. 7. Chip microphotograph and measurement setup.

One of edge-sampling clocks ( $CK_{E0-3}$ ) is selected and used for sampling according to  $T_{0-3}$ , which rotates in synchronization with the divided clock ( $CK_{DIV}$ ). Data sampling clocks ( $CK_{D0-3}$ ) are continuously supplied to recover data without loss. Compared with conventional multi-phase PAM-4 CDR, our CDR with the edge-rotation scheme can save 9 comparators and 3 clock buffers. With this, our CDR has 39 % less power consumption and 18 % less chip-area when designed in 28-nm CMOS, even though it requires addition of a frequency divider, a rotational signal generator, a 4:1 MUX and 3 4:2 MUXs as shown in Fig. 6.

## III. CIRCUIT IMPLEMENTATION

Fig. 6 shows the block diagram of our quarter-rate edgerotating PAM-4 CDR with the STD. It is composed of 15 comparators, three 4:2 MUXs, one of 4:1 MUX, frequency divider, rotational signal generator, charge pump, multiphase VCO and 1/4-rate STD. The reference voltages ( $V_{\rm H}$ ,  $V_{\rm M}$ , and  $V_{\rm L}$ ) for comparators are externally provided. The charge pump has the structure given in [9]. The loop filter is implemented







Fig. 8. Eye-diagrams of (a) PAM-4 input, and (b) recovered clock.

off-chip. The Lee-Kim delay cell [10] is used for producing multiphase clocks. The divided-by-16 clock signal is used for generating rotating signals.

4:1 MUXs for rotating edge-sampling clocks are designed with the structure shown in Fig. 6. 4:1 dummy buffers (DB) having the same structures are also used to minimize the phase skew between data-sampling and edge-sampling clocks. These schemes are also used for 4:2 MUXs and 4:2 DBs for BBPD outputs to minimize the skew between sampled data by  $CK_{\rm EN}$  and  $CK_{\rm D0-3}$ .



Fig. 9. Measured jitter tolerance.

As shown in the Fig. 6, a 2-bit counter and a 2-to-4 binary decoder generate 4-bit digital codes ( $T_0$ ,  $T_1$ ,  $T_2$ ,  $T_3$ ) for selecting the correct edge-tracking clock and sampled data outputs in synchronization with divided-by-16 clock signal.

PAM-4 decoder recovers PAM-4 signal into deserialized 8 lanes. As shown in Fig. 6, the recovered and deserialized MSB can be produced by sharing the sampler used in BBPD, and the LSB is produced by 3-input XORs using BBPD outputs  $(D_{\text{H0}}, D_{\text{M0}}, D_{\text{L0}})$  when the sampling clock is  $CK_{\text{D0}}$ .

## IV. MEASUREMENT RESULTS

A prototype quarter-rate 32-Gb/s PAM-4 CDR with the STD is implemented in 28-nm CMOS technology. The chip microphotograph and the measurement setup are shown in Fig. 7. The circuit consumes 32 mW at 1.2-V supply voltage and occupies 0.022 mm<sup>2</sup> excluding output buffers. The chip is mounted on a FR-4 printed circuit board and wire-bonded for measurement. A 2-channel pulse pattern generator (PPG) produces two 16-Gb/s PRBS 2<sup>7</sup>-1 data sequences for MSB and LSB, which are combined with a power combiner and introduced to our CDR. The recovered deserialized NRZ data and clock signals are measured by a digital sampling scope and the bit error rate is measured by a BERT. No error was observed in any of 8 lanes while 4 x 10<sup>12</sup> bits were transmitted.

Fig. 8 shows the eye-diagram of input PAM-4 data and the recovered clock signals. No error was observed while  $4 \times 10^{12}$  bits were transmitting and the recovered clock has rms jitter of 0.0136 UI, which is mostly due to ring-type VCO used in our CDR

Fig. 9 shows the result of the jitter tolerance measurement for BER less than 10<sup>-12</sup> with PRBS 2<sup>7</sup>-1 input data. Our CDR satisfies the jitter tolerance mask for CEI-56G-VSR.

The performance of our CDR is compared in Table II with those of previously reported PAM-4 CDRs. As can be seen in the table, our CDR has smaller power consumption and chip area. Our CDR shows worse jitter performance for the recovered clock. This is primary due to the ring-type VCO that we used. An external clock is used in [4] and LC VCO is used in [11], both of which should provide much better jitter performance for the recovered clock signal.

|                              | [4]                                                         | [11]                                                    | This work                                                |
|------------------------------|-------------------------------------------------------------|---------------------------------------------------------|----------------------------------------------------------|
| Data Rate<br>(Gb/s)          | 22                                                          | 54.1-56.8                                               | 32                                                       |
| Receiver<br>Clock jitter     | $J_{ m rms}$ =1.64 ps<br>$J_{ m pkp}$ =13.3 ps<br>@ 1.4 GHz | $J_{ m rms}$ =0.53 ps<br>$J_{ m pkp}$ =2 ps<br>@ 28 GHz | $J_{ m rms}$ =3.8 ps<br>$J_{ m pkp}$ =22.2 ps<br>@ 4 GHz |
| Power consumption (mW)       | 228*                                                        | 180                                                     | 32                                                       |
| Power efficiency (mW/Gbit/s) | 10.4                                                        | 3.2                                                     | 1                                                        |
| Chip Area (mm²)              | 1                                                           | 1.6                                                     | 0.022                                                    |
| Technology                   | 90 nm SOI<br>CMOS                                           | 40nm<br>CMOS                                            | 28 nm<br>CMOS                                            |

<sup>\*</sup> Includes PRBS checker

## V. CONCLUSION

A quarter-rate 32-Gb/s PAM-4 CDR having a novel phase detector structure, STD, is demonstrated. The STD produces the desired UP/DN signals very efficiently. In addition, the edge-rotating technique reduces power consumption and chip area. A prototype CDR realized in 28-nm CMOS technology successfully demonstrates.

## REFERENCES

- [1] H. Ju, M. C. Choi, G. S. Jeong, W. Bae and D. K. Jeong, "A 28 Gb/s 1.6 pJ/b PAM-4 Transmitter Using Fractionally Spaced 3-Tap FFE and G<sub>m</sub>-Regulated Resistive-Feedback Driver," in *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 64, no. 12, pp. 1377-1381, Dec. 2017.
- [2] C. Cole, "PAM-N Tutorial Material," *IEEE 802.3 Plenary Sess.*, no. March, 2012.
- [3] J. L. Zerbe et al., "Equalization and Clock Recovery for a 2.5-10-Gb/s 2-PAM/4-PAM Backplane Transceiver Cell," IEEE J. Solid-State Circuits, vol. 38, no. 12, pp. 2121–2130, 2003.
- [4] T. Toifl et al., "A 22-Gb/s PAM-4 receiver in 90-nm CMOS SOI technology," *IEEE J. Solid-State Circuits*, vol. 41, no. 4, pp. 954–964, 2006.
- [5] A. Nazemi et al., "A 36Gb/s PAM4 transmitter using an 8b 18GS/S DAC in 28nm CMOS," Dig. Tech. Pap. - IEEE Int. Solid-State Circuits Conf., vol. 58, pp. 58–59, 2015.
- [6] B. Hu, Y. Du, R. Huang, J. Lee, Y. K. Chen and M. C. F. Chang, "A Capacitor-DAC-Based Technique For Pre-Emphasis-Enabled Multilevel Transmitters," in *IEEE Transactions on Circuits and* Systems II: Express Briefs, vol. 64, no. 9, pp. 1012-1016, Sept. 2017.
- [7] H. Li *et al.*, "A 0.8V, 560fJ/bit, 14Gb/s injection-locked receiver with input duty-cycle distortion tolerable edge-rotating 5/4X subrate CDR in 65nm CMOS," *IEEE Symp. VLSI Circuits, Dig. Tech. Pap.*, pp. 2–3, 2014.
- [8] D.-H. Kwon, Y.-S. Park, and W.-Y. Choi, "A clock and data recovery circuit with programmable multi-level phase detector characteristics and a built-in jitter monitor," *IEEE Trans. Circuits Syst. I Regul. Pap.*, vol. 62, no. 6, 2015.
- [9] J. S. Lee, W. K. Jin, D. M. Choi, G. S. Lee, and S. Kim, "A wide range pll for 64x speed CD-ROMS & 10X speed DVD-ROMS," *IEEE Trans. Consum. Electron.*, vol. 46, no. 3, pp. 487–493, 2000.
- [10] J. Lee and B. Kim, "A low-noise fast-lock phase-locked loop with adaptive bandwidth control," *Phase-Locking High-Performance Syst. From Devices to Archit.*, vol. 35, no. 8, pp. 430–438, 2003.
- [11] J. Lee, P. C. Chiang, P. J. Peng, L. Y. Chen, and C. C. Weng, "Design of 56 Gb/s NRZ and PAM4 SerDes transceivers in CMOS technologies," *IEEE J. Solid-State Circuits*, vol. 50, no. 9, pp. 2061–2073, 2015.

Table II. PAM-4 CDR performance comparison.

