# A 10-Gb/s Multiphase Clock and Data Recovery Circuit with a Rotational Bang-Bang Phase Detector

Dae-Hyun Kwon, Jinsoo Rhim, and Woo-Young Choi

Abstract—A multiphase clock and data recovery (CDR) circuit having a novel rotational bang-bang phase detector (RBBPD) is demonstrated. The proposed 1/4-rate RBBPD decides the locking point using a single clock phase among sequentially rotating 4 clock phases. With this, our RBBPD has significantly reduced power consumption and chip area. A prototype 10-Gb/s 1/4-rate CDR with RBBPD is successfully realized in 65-nm CMOS technology. The CDR consumes 5.5 mW from 1-V supply and the clock signal recovered from 2<sup>31</sup>-1 PRBS input data has 0.011-UI rms jitter.

Index Terms—Bang-bang phase detector, clock and data recovery, multiphase

# I. Introduction

The clock and data recovery (CDR) circuit is one of the most critical building blocks that determine the overall transceiver performance in serial data communication systems. Recently, increasing demands for higher data-rate systems are making CDR design very challenging. Multiphase CDRs having bang-bang phase detectors (BBPD) are widely used for high-speed applications [1, 2] as they can avoid the speed bottleneck by utilizing sub-rate clocks and the binary nature of BBPD allows relatively easier implementation. However, the multiphase structure can consume a large amount of power and requires a large chip area. Previously, the

Data

Data

Definition of the content of the conte

**Fig. 1.** (a) Conventional 1/4-rate BBPD CDR, (b) conventional BBPD operation.

charge steering latch has been used for the sampler resulting in dramatically reduced power consumption [3], but it requires two capacitors per one latch resulting in

Manuscript received Jul. 19, 2015; accepted Jan. 13, 2016 Dept. of Electrical and Electronic Engineering, Yonsei University, Seoul 120-794, Korea

E-mail: wchoi@yonsei.ac.kr



Fig. 2. (a) 1/4-rate RBBPD CDR, (b) RBBPD operation.

the relatively large chip area. The single edge-tracking method has been used for power and chip area reduction [4], but this requires 9b/10b encoding and a preamble, which cannot be used for all applications, in order to compensate jitter-tracking bandwidth degradation.

In this paper, we demonstrate a relatively simple technique of power and chip-area reduction for the multiphase CDR. Our technique is based on a novel rotational BBPD (RBBPD) which selects one edge-tracking clock among sequentially rotating 4 edge-tracking clocks.

This paper is organized as follows. In Section II, we explain our multiphase RBBPD CDR structure and its circuit implementation. Section III presents measurement



Fig. 3. Locking process of (a) BBPD, (b) RBBPD in behavioral simulation.

results of a prototype chip. Section VI gives the conclusion.

## II. RBBPD STRUCTURE

Fig. 1(a) shows the structure of a typical 1/4-rate CDR [2] having 4 BBPDs and 4 charge pumps. Among 8 clock signals generated from VCO, 4 ( $CK_{0,2,4,6}$ ) are used for data sampling producing  $D_0$ ,  $D_2$ ,  $D_4$ ,  $D_6$ , and the rest ( $CK_{1,3,5,7}$ ) for edge-tracking producing  $D_1$ ,  $D_3$ ,  $D_5$ ,  $D_7$  as shown in Fig. 1(b). Lead and lag signals produced by BBPDs are converted into currents by charge pumps and summed up and averaged in the loop filter.

Our RBBPD has only one BBPD as shown in Fig. 2(a). The edge-tracking signal is provided by  $DFF_{\rm E}$  whose clock signal is selected from  $CK_{1,3,5,7}$  with control bit  $T_{0,1,2,3}$  and supplied to BBPD. Sampled data signals required for BBPD  $(D_A$  and  $D_B)$  are selected from



Fig. 4. (a) 2-bit counter, (b) 2-to-4 binary decoder, (c) timing diagram of the decoder.

 $DFF_{0,2,4,6}$  output signals with  $T_{0,1,2,3}$  so that correct combination of edge-tracking and data-sampling signals is achieved. The table in Fig. 2(a), shows the resulting BBPD input combinations for each  $T_{0,1,2,3}$  setting. Since  $T_{0,1,2,3}$  setting changes every 32 clock cycles, determined by the frequency divider, the data transition detection density of our RBBPD CDR is 1/4 of the conventional multiphase CDR as schematically shown in Fig. 2(b).

Compared to the conventional multiphase CDR, our RBBPD CDR can save 3 DFFs, 6 XORs, and 3 chargepumps, and requires additional frequency divider, 2-bit counter and 2-to-4 binary decoder, as can be determined by comparing Fig. 1(a) and 2(a). Since the operating speed for additional blocks is much smaller than that for those saved blocked, our RBBPD CDR achieves reduction of the total power consumption as well as the chip area. Such saving in power and area can be achieved without any detrimental influence on CDR dynamics by rotating edge-tracking clocks and data-sampling signals at a higher frequency than the CDR bandwidth. In addition, our RBBPD CDR has the smaller sampling density since RBBPD samples data edges 4 times less frequently than in the conventional multiphase CDR. The influence of this difference can be easily mitigated by making the charge pump current four times larger.

Fig. 3(a) and (b) show the behavioral simulation results for the CDR control voltages when 10Gbps  $2^{31}$ -1 PRBS data are introduced into the conventional multiphase CDR and our RBBPD CDR, respectively. For the simulation, our RBBPD CDR has the charge pump current of 500  $\mu$ A (4x $I_{CP}$ ), which is four times larger than the conventional CDR charge pump current ( $I_{CP}$ ). The clock rotating frequency is 78.125 MHz, which is 1/32th



Fig. 5. (a) 8-phase VCO, (b) delay cell.

of the recovered clock frequency. As can be seen in the figures, locking dynamics for two types of CDRs are very similar. However, our RBBPD CDR shows larger dithering jitters because in our RBBPD CDR, the charge pump current dithers among  $+4I_{\rm CP}$ , 0, and  $-4I_{\rm CP}$ , whereas in conventional CDR, it dithers among  $4I_{\rm CP}$ ,  $+2I_{\rm CP}$ , 0,  $-2I_{\rm CP}$ , and  $-4I_{\rm CP}$ , resulting in a smaller RMS value for the dithering jitter.

### III. MEASUREMENT RESULTS

A prototype 1/4-rate 10-Gb/s multiphase CDR with RBBPD is implemented in 65-nm CMOS technology. 4to-1 multiplexers used for clock signal and sampled data selection are composed of 4 pass gates. Dummy buffers are added for VCO ( $CK_{0,2,4,6}$ ) and  $DFF_E$  output signals in order to prevent delay skews as shown in Fig. 2(a). 2-bit counter (Fig. 4(a)) and 2-to-4 binary decoder (Fig. 4(b)) produce 4-bit digital code  $(T_{0,1,2,3})$  for selecting the correct edge-tracking clock and sampled data outputs in synchronization with divided-by-32 clock signal. Fig. 4(c) shows the timing diagram for the counter and decoder output signals. Fig. 5 shows the structure of 8phase VCO [7] with external coarse frequency tuning and duty cycle correctors which compensate duty cycle distortions caused by the pseudo differential delay cell. An off-chip resistor and a capacitor are used for the loop



Fig. 6. Chip microphotograph and measurement setup.



Fig. 7. Eye diagrams of (a) recovered clock, (b) recovered data.



Fig. 8. Measured phase noise.

implementation. Fig. 6 shows the microphotograph. CDR except the output buffers consumes 5.5 mW with 1-V supply and occupies 3610 μm<sup>2</sup>. The fabricated chip is mounted on FR-4 printed circuit board and wire-bonded for measurement. Fig. 6 shows the measurement setup for evaluating CDR performance. A pulse patter generator (PPG) produces 10-Gb/s PRBS 231-1 data, and recovered clock and data are measured by a digital sampling scope and a signal source analyzer. The bit error rate tester (BERT) checks if the CDR produces any errors when jitters are injected into input data. Fig. 7 shows measured eye diagrams for recovered clock and data. The recovered clock has rms



Fig. 9. Measured jitter tolerance at 10Gb/s.

Table 1. Performance comparison with multi-rate CDR

|                                       | [3]   | [4]  | [8]      | This Work |
|---------------------------------------|-------|------|----------|-----------|
| Process (nm)                          | 65    | 180  | 130      | 65        |
| Supply (V)                            | 1     | 1.8  | 1.2      | 1         |
| Data Rate (f <sub>b</sub> )<br>(Gbps) | 25    | 6.93 | 3.24/5.4 | 10        |
| f <sub>b</sub> /f <sub>Clk</sub>      | 2     | 10   | 2        | 4         |
| Power Consumption (mW)                | 4.97  | 26.2 | 138*     | 5.5       |
| Recovered Clock<br>RMS Jitter (mUI)   | 19.5  | 4.2  | 16.1     | 11.25     |
| Power Efficiency<br>(mW/Gbit/s)       | 0.199 | 3.4  | 19.3     | 0.55      |
| Die area (mm²)                        | 0.039 | 0.14 | 1.1**    | 0.003     |

<sup>\*</sup> including decoupling capacitors

jitter of 11.25 mUI<sub>rms</sub>.

Fig. 8 shows the phase noise of the recovered clock. The spurs observed at 19.5 MHz and its harmonics are due to periodic switching in 4-to-1 multiplexers. Fig. 9 shows the result of jitter tolerance measurement for BER less than 10<sup>-12</sup> with PRBS 2<sup>31</sup>-1 input data. Although the amount of data edges our CDR samples in a given time interval is four times less than the conventional multiphase CDR, our CDR does not suffer from jitter tracking bandwidth degradation.

Our CDR has 3 DFFs, 6 XORs, and 3 charge pumps less than the conventional multiphase CDR, but requires additional frequency divider, 2 bit counter, and 2-to-4 binary decoder. When designed in 65-nm CMOS technology, 130  $\mu$ W and 120  $\mu$ m<sup>2</sup> are needed for DFF, 75  $\mu$ W and 86  $\mu$ m<sup>2</sup> for XOR, 500  $\mu$ W and 432  $\mu$ m<sup>2</sup> for charge pump, 70  $\mu$ W and 190  $\mu$ m<sup>2</sup> for frequency divider,

<sup>\*\*</sup> including 2:1 MUX and output buffers

 $75~\mu W$  and  $370~\mu m^2$  for 2-bit counter and 2-to-4 binary decoder. With these, the conventional multiphase CDR would have power consumption of 6.26 mW and chip area of 4950  $\mu m^2$ , which correspond to 13.8 % more power and 37.2 % more chip area compared to our RBBPD CDR.

The performance of our RBBPD CDR is compared with previously reported multiphase CDRs based on BBPDs in Table 1. As can be seen in the table, our RBBPD CDR occupies the smallest chip area and achieves relatively small power efficiency. The CDR reported in [3] can achieve the smallest power efficiency as it is based on LC-VCO, which consumes a very small amount of power but occupies a large chip area. Our RBBPD is compatible with any multiphase CDR architecture based on BBPDs.

### IV. CONCLUSIONS

A 1/4-rate 10-Gb/s multiphase CDR with a novel RBBPD is demonstrated. Our RBBPD requires only one BBPD and one charge pump and, consequently, it has significantly reduced power consumption and chip area compared to the conventional 1/4-rate multiphase CDR. A prototype chip fabricated in 65-nm CMOS technology successfully demonstrates that our RBBPD operates properly.

# ACKNOWLEDGMENTS

This work was supported by the National Research Foundation of Korea grant funded by the Korea government (MEST) [2015R1A2A2A01007772]. The authors are also thankful to IDEC for MPW and EDA software support.

### REFERENCES

- [1] J.-K. Kim, et al., "A Fully Integraed 0.13-um CMOS 40-Gbs/ Serial Link Transceiver," *Solid-State Circuits, IEEE Journal of*, vol. 44, no. 5, pp. 1510–1521, May 2009.
- [2] J. Lee, et al., "A 40-Gb/s Clock and Data Recovery Circuit in 0.18-um CMOS Technology," Solid-State Circuits, IEEE Journal of, vol. 38, no. 12, pp. 2181–2190, May 2009.

- [3] J. W. Jung, et al., "A 25-Gb/s 5mW CMOS CDR/deserializer", *Solid-State Circuits*, *IEEE Journal of*, vol. 48, no. 3, pp. 684–697, Mar. 2013.
- [4] K.-S. Kwak, et al., "Power-Reduction Technique Using a Single Edge-Tracking Clock for Multiphase Clock and Data Recovery Circuits", *Circuits and Systems II, IEEE Transactions on*, vol. 61, no. 4, pp. 239–243, Apr. 2014.
- [5] J. Lee, et al., "Analysis and Modeling of Bang-Bang Clock and Data Recovery Circuits", *Solid-State Circuits, IEEE Journal of*, vol. 39, no. 9, pp. 1571–1580, Sep. 2004.
- [6] D.-H. Kwon, et al., "A Clock and Data Recovery Circuit with Programmable Multi-Level Phase Detector Characteristics and a Built-in Jitter Monitor", Circuits and Systems I, IEEE Transactions on, vol. 62, no. 6, pp. 1472–1480, Jun. 2015.
- [7] J. Lee, et al., "A Low-Noise Fast-Lock Phase-Locked Loop with Adaptive Bandwidth Control", *Solid-State Circuits, IEEE Journal of*, vol. 29, no. 8, pp. 1482–1490, Dec. 1994.
- [8] W.-Y. Lee, et al., "A 5.4-Gb/s Clock and Data Recovery Circuit Using Seamless Loop Transition Scheme With Minimal Phase Noise Degradation", *Circuits and Systmes I, IEEE Transactions on*, vol. 59, no. 11, pp. 2581–2528, Nov. 2012.



**Dae-Hyun Kwon** received the degrees in school of electrical and electronic engineering at Yonsei University, Seoul, Korea, in 2011. He is currently working toward the Ph.D. degree at Yonsei University. His research interests include clock and

data recovery circuits for high-speed communication, and high-speed I/O interface circuits.



Jinsoo Rhim received the B.S. and M.S. degrees in electrical and electronic engineering from Yonsei University, Seoul, Korea, in 2009 and 2011, respectively, where he is currently working toward the Ph.D. degree. His research interests include

high-speed interface circuits and silicon photonics for optical interconnects.



Woo-Young Choi received the B.S., M.S., and Ph.D. degrees in electrical engineering and computer science from the Massachusetts Institute of Technology (MIT), Cambridge, in 1986, 1988, and 1994, respectively. From 1994 to 1995, he was a Post-

Doctoral Research Fellow with NTT Opto-Electronics Laboratories in Japan. In 1995, he joined the Department of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea, where he is currently a Professor. His research interest is in the area of high-speed circuits and systems that include high-speed interface circuits and Si photonics.