

## 2023 IEEE Asian Solid-State Circuits Conference(A-SSCC)

## **ADVANCE PROGRAM**

November 5<sup>th</sup> - 8<sup>th</sup>, 2023 Starbay Haikou Pullman Hotel Haikou, Hainan, China

Sponsored by IEEE Solid-State Circuits Society (IEEE SSCS)



| Session          | 10: Advanced Wireline Transceiver Techniques |
|------------------|----------------------------------------------|
| Time             | Tuesday, Nov. 7th, 10:50-12:30               |
| Venue            | Coral Hall                                   |
| Session Chair    | Peng Liu, Zhejiang University                |
| Session Co-Chair | Ilmin Yi, GIST                               |

#### 10-1

10:50~11:15

#### A 4×112-Gb/s PAM-4 Silicon-Photonic Transceiver Front-End for Linear-Drive Co-Packaged Optics

Han Liu1'2, Nan Qi1'2, Donglai Lu1'2, Zizheng Dong1, ZhihanZhang1'2, Jian He1, Guike Li1'2, Leliang Li1'2,

Ye Liu<sup>3</sup>, Ziyue Dang<sup>3</sup>, Daigao Chen<sup>3'4</sup>, Zhao Zhang<sup>1'2</sup>, Jian Liu<sup>1'2</sup>, Nanjian Wu<sup>1'2</sup>, XiXiao<sup>3'4</sup>, Liyuan Liu<sup>1'2</sup>

<sup>1</sup>Institute of Semiconductors, Chinese Academy of Sciences

<sup>2</sup>University of Chinese Academy of Sciences

<sup>3</sup>National Information Optoelectronics Innovation Center

<sup>4</sup>Wuhan Research Institute of Posts & Telecommunications

### 10-2 11:15~11:40

#### A 80Gb/s/pin Single-Ended PAM-4 Transmitter with an Edge Boosting Auxiliary Driver and a 4-Tap FFE in

#### 28-nm CMOS

Dae-Won Rho<sup>1</sup>, Jae-Koo Park<sup>12</sup>, Seung-Jae Yang<sup>1</sup> and Woo-YoungChoi<sup>1</sup>

<sup>1</sup>Yonsei University

<sup>2</sup>Samsung Electronics

#### 10-3

#### A 2 × 24Gb/s Single-Ended Transceiver with Channel-Independent Encoder-Based Crosstalk Cancellation in

#### 28nm CMOS

Hongzhi Wu, Weitao Wu, Liping Zhong, Xuxu Cheng, Xiongshi Luo, Zhenghao Li, Dongfan Xu, Quan Pan

Southern University of Science and Technology

#### 10-4

#### A Time-Based PAM-4 Transceiver Using Single Path Decoder and Fast-Stochastic Calibration Techniques

Dong-Hyun Yoon<sup>1</sup>, He Junsen<sup>1</sup>, Kwang-Hyun Baek<sup>2</sup>, Youngdon Choi<sup>3</sup>, Jung-Hwan Choi<sup>3</sup>, Tony Tae-Hyoung Kim<sup>1</sup>

<sup>1</sup>Nanyang Technological University

<sup>2</sup>Chung-Ang University

<sup>3</sup>Samsung Electronics

12:05~12:17

11:40~12:05

## A 80Gb/s/pin Single-Ended PAM-4 Transmitter With an Edge Boosting Auxiliary Driver and a 4-Tap FFE in 28-nm CMOS

Dae-Won Rho<sup>1\*</sup>, Jae-Koo Park<sup>12\*</sup>, Seung-Jae Yang<sup>1</sup> and Woo-Young Choi<sup>1</sup>

<sup>1</sup>Yonsei University, Seoul, Korea

<sup>2</sup>Samsung Electronics, Hwaseong, Korea

#### \*Equally-Credited Authors (ECAs)

The demands for the higher-bandwidth memory access are continuously increasing for many applications such as data centers, HPC, and AI processors. With this, the importance of high-speed memory interfaces is increasing as well and a number of technical approaches are being pursed in order to overcome the channel bandwidth limitation. Especially, the pulse amplitude modulation 4 (PAM-4) technique is actively investigated [1-4]. PAM-4 can reduce the symbol rate by half but at the cost of the reduced signal-to-noise ratio (SNR), resulting in the increased bit error rate (BER)[5]. To minimize the SNR decrease, the level-separation mismatch ratio (RLM) should be optimized. Furthermore, the performance of equalizers should be well optimized for inter-symbol interference (ISI) reduction. This article presents the technique of achieving a 80Gb/s transmitter (TX) with a PAM-4 single-ended voltage mode (VM) driver implemented in 28-nm CMOS technology, which includes a reconfigurable 4-tap feed-forward equalizer (FFE) and an edge-boosting auxiliary driver for channel equalization.

Fig. 1 illustrates the overall TX structure. It receives 10GHz clock from an external source, which goes through the poly phase filter (PPF) and is divided into four-phase clocks (C4<sub>1/Q/IB/QB</sub>). Each of these clocks passes through a duty cycle corrector (DCC) and a quadrature error corrector (QEC), resulting in the quadrature clock with the precise phase difference and duty cycle. Subsequently, these clocks are divided into the octal-rate clocks (C8<sub>I/Q/IB/QB</sub>) and applied to each data path. The data path begins with the generation of PRBS31 data in the pattern generator, which undergo thermometer encoding, resulting in three distinct data signals. Prior to being introduced into the serializers, re-timers are employed to ensure an optimal timing margin. The serialized data then pass through an 8:4 multiplexer (MUX) and a 4:1 MUX, with each output signal directed to five bundles of drivers. Within these bundles, one driver is designed to be tunable with the calibration code, enabling the precise adjustment for the RLM of PAM-4 data and ensuring 50ohm impedance matching. In the case of the FFE, the data selection from the 8:4 MUX permits the generation of a total of four-tap data, each of which can be directed into a segmented driver. To boost the data output bandwidth, an auxiliary driver having the same structure as the single slice of the bundle driver is implemented. This auxiliary driver accepts data with transition information encoded within, thereby reducing rising and falling times during transitions without distorting the output level.

To generate PAM-4 data, the driver can be configured with MSB and LSB drivers. However, for the structural symmetry, it is common to utilize two instances of the MSB driver and one instance of the LSB driver [1-3]. Moreover, when applying binary-to-thermometer encoding to the MSB and the LSB data, the data toggle, which is the main contributor to power consumption in the VM driver, can be reduced. Fig. 2 (a) depicts a conventional low-voltage swing terminated logic (LVSTL) PAM-4 driver, which allows the efficient driver size and the impedance control. However, its critical path stage, operating at the high speed, consists of three stages, leading to poor power-supply-induced jitters (PSIJ). In contrast, Fig. 2 (b) illustrates the driver structure of a single slice used in our design. It employs a 2-stack structure for the impedance control, reducing the critical path to two stages and improving PSIJ. Within the driver's data path, quarter-rate data (D4(PU/PD)\_(A/B/C)) are fed into the pull-up and the pull-down drivers in opposite signs. Similarly, the ZQ calibration code is split into pull-up and pull-down codes for each of A, B, C cases, with a resolution of 5 bits each (ZQ<sub>(PU/PD)\_(A/B/C)</sub>[4:0]). By calibrating each thermometer-encoded ZQ code, it becomes possible to achieve both 50-ohm impedance matching and RLM control simultaneously [6].

Fig. 3 illustrates the components for constructing an edge-boosting auxiliary driver, including an encoder, timing examples, and the 4:1 MUX at the front end of the auxiliary driver. The encoder is composed of logic gates, shown in Fig. 3. By sequentially inputting the parallel data of 9 bits (D8[7:0], D8<sub>PRE</sub>[7]) into the logic gates, we can obtain data that carry information about data transitions (D8<sub>R/F</sub>[7:0]). The obtained data are then serialized through the 4:1 MUX. During this process, the pulse width of the 4-phase quadrature clock (C4<sub>VQ/IB/QB</sub>) can be adjusted using delay cells, allowing the control over the pulse width time (t<sub>PW</sub>) of D8<sub>R/F</sub>[7:0]. This enables the generation of input data for the edge-boosting auxiliary driver and provides the ability to finely adjust the pulse width, ensuring the precise control over the timing characteristics.

Fig. 4 illustrates the specific architecture of the 8:4 MUX, its timing diagrams, and the table for the relevant output of the data selector. 8 data signals with 1/8 speed of the output signal are generated in parallel. By selecting the appropriate data in the data selector for each case of 4 taps and sampling them at the correct timing, quarterrate data can be obtained. To enhance the bandwidth, 4 UI pulse generators are used instead of a 3-stacked MUX. Furthermore, by not using a clock selector for timing margin optimization, the loading of the octal rate clock is reduced, which results in power consumption reduction.

Fig. 5 (a) and (b) display the measured eye diagrams of PRBS31 40Gb/s NRZ and PRBS31 80Gb/s PAM-4 single-ended outputs, respectively. The PAM-4 output exhibits a voltage swing of 297mV, with an eye opening of approximately 45mV in the worst case. Fig. 5 (c) illustrates a closed 64Gb/s eye diagram without FFE, passing through a channel with approximately -6.16dB loss at around 16GHz, while Fig. 5 (d) shows the output with equalization achieved with the FFE for the same channel loss. Although the swing is reduced about 15%, the worst eye opening is 37mV, indicating an improved performance. The total power consumption is 246mW, resulting in the energy efficiency of 3.07pJ/bit at 80Gb/s.

In Fig. 6 (a), the measured S21 of the channel is shown. Fig. 6. (b) provides a detailed power break-down. 4:1 MUX and driver consume the largest portion of power (137.5mW), followed by 10GHz clock distribution (43.3mW), 8:4 MUX (33.5mW), and pattern generator with encoder (31.7mW). Fig. 6 (c) presents a comparison table of our measurement results with the published state-of-the-art TXs. Our work achieves the highest data rate among all single-ended TXs.

#### Acknowledgments

This work was supported by Samsung Electronics Co., Ltd(IO201218-08228-01). The EDA tool was supported by the IC Design Education Center(IDEC), Korea.

#### **References:**

[1] Z. Toprak-Deniz *et al.*, "A 128-Gb/s 1.3-pJ/b PAM-4 Transmitter With Reconfigurable 3-Tap FFE in 14-nm CMOS," *IEEE JSSC*, vol. 55, no. 1, pp. 19-26, Jan. 2020.

[2] P. J. Peng, *et al.*, "A 112-Gb/s PAM-4 Voltage-Mode Transmitter with Four-Tap Two-Step FFE and Automatic Phase Alignment Techniques in 40-nm CMOS," *IEEE JSSC*, vol. 56, no. 7, pp. 2123–2131, 2021.

[3] J. Kim *et al.*, "A 224-Gb/s DAC-Based PAM-4 Quarter-Rate Transmitter with 8-Tap FFE in 10-nm FinFET," *IEEE JSSC,* vol. 57, no. 1, pp. 6–20, 2022.

[4] Z. Wang *et al.*, "An Output Bandwidth Optimized 200-Gb/s PAM-4 100-Gb/s NRZ Transmitter with 5-Tap FFE in 28-nm CMOS," *IEEE JSSC*, vol. 57, no. 1, pp. 21–31, 2022.

[5] K. R. Lakshmikumar *et al.*, "A Process and Temperature Insensitive CMOS Linear TIA for 100 Gb/s/ $\lambda$  PAM-4 Optical Links," *IEEE JSSC*, vol. 54, no. 11, pp. 3180–3190, 2019.

[6] Y. U. Jeong *et al.*, "A 0.64-pJ/Bit 28-Gb/s/Pin High-Linearity Single-Ended PAM-4 Transmitter with an Impedance-Matched Driver and Three-Point ZQ Calibration for Memory Interface," *IEEE JSSC*, vol. 56, no. 4, pp. 1278–1287, 2021.

[7] J. -H. Park *et al.*, "A 32Gb/s/pin 0.51 pJ/b Single-Ended Resistorless Impedance-Matched Transmitter with a T-Coil-Based Edge-Boosting Equalizer in 40nm CMOS," *ISSCC*, pp. 410-412, 2023.

1

#### IEEE ASSCC 2023/ Session 10/ Paper 10.2









# A 80Gb/s/pin Sinlge-Ended PAM-4 Transmitter With an Edge Boosting Auxiliary Driver and a 4-Tap FFE in 28-nm CMOS

## <u>Dae-won Rho</u><sup>1\*</sup>, Jae-Koo Park<sup>1, 2\*</sup>, Seung-Jae Yang<sup>1</sup>, and Woo-Young Choi<sup>1</sup>

<sup>1</sup>Yonsei University <sup>2</sup>Samsung Electronics \*Dae-won Rho and Jae-Koo Park equal contribution

<Session 10>

IEEE Asian Solid-State Circuits Conference 2023

