# P-5-6

# Additional 12% Power Reduction in Practical Digital Chips of Low-Power Design using Post-Fabrication Clock-Timing Adjustment

Eiichi Takahashi<sup>1</sup>, Tatsuya Susa<sup>2</sup>, Masahiro Murakawa<sup>1</sup>, Tatsumi Furuya<sup>2</sup>, Tetsuya Higuchi<sup>1</sup>, Shinji Furuichi<sup>3</sup>, Yoshitaka Ueda<sup>3</sup> and Atsushi Wada<sup>3</sup>

> <sup>1</sup>Information Technology Research Institute, AIST Central 2 (MBOX 35225), 1-1-1, Umezono, Tsukuba, Ibaraki 305-8568, Japan Phone: +81-29-861-5162 E-mail: e.takahashi@aist.go.jp <sup>2</sup>Graduate School of Science, Toho University 2-2-1, Miyama, Funabashi-shi, Chiba 274-8510, Japan <sup>3</sup>Sanyo Electric Co., Ltd. 180, Ohmori, Anpachi-cho, Anpachi-gun, Gifu 503-0195, Japan

#### 1. Introduction

In order to solve the clock skew problem, which becomes more and more serious to sub-100nm process, we have proposed a post-fabrication clock-timing adjustment technique [1] and have been improving it to expand its applications. Several kinds of test chips were designed and fabricated, and useful results from experiments with the chips have been already shown in several papers. In this paper, the adjustment technique is applied to practical circuits of low-power design. Experiments have demonstrated an additional reduction in the power consumption of 12% even with the chips of low-power design.

### 2. Post-Fabrication Clock-Timing Adjustment

A post-fabrication clock-timing adjustment technique is considered to be an expanded trimming technique, and realizes determining the values of the trimming devices automatically and simultaneously. The adjustment technique has a lot of advantages. Many trimming devices can be handled and our experiments show up to one thousand devices are successfully adjusted. The adjustment is done with genetic algorithms (GA), which are robust search algorithms in artificial intelligence. Clock timing itself is not measured, but results of function tests are utilized to adjust clock timing.

Fig.1 depicts overview of the post-fabrication clock-timing adjustment. First, programmable delay circuits (delayers) are inserted into a standard LSI designed by normal methods. Next, LSI chips are fabricated according to standard procedures. When the chips are tested after fabrication, the values of the programmable delay circuits are determined by the GA-based adjustment software, which is executed on the LSI tester. Finally, shipped after adjustment, not only is the operational yield increased, but the chips operate with lower power-supply voltage.

# 3. DCT Test Chip Design

An application target is a DCT circuit of low-power design [2], and the purpose of the study is an achievement of an additional reduction in power consumption of the chips.

Fig.2 shows a block diagram of the DCT circuit. The circuit block contains a "timing-adjustable" DCT and ref-

erence DCT, which accelerate iterative function tests to compare outputs of the two DCT circuits and to count errors by hardware counters. The timing-adjustable DCT has 1,031 programmable delayers, and the delay is adjusted by 4-bit value and is changed from 320ps to 1,015ps.

Table I summarizes specifications of the chip, and Fig.3 is a photomicrograph of the chip. The photo shows the chip contains eight DCT blocks and they lie side by side.

This chip is a revised version, and the first implementation is reported in reference [3].

# 4. Experimental Results

An experimental system consists of a PCB on which the chip is mounted, a PCB to control the test chip, a power supply for the PCBs, a pulse generator to generate clock, and PC to control these instruments and to execute GA-based adjustment software.

Although the DCT contains 1,031 delayers, we have restricted adjusting delayers to 300 using STA results [4]. Area overhead for the 300 delayers is 5% on the chip. 10 chips were measured, and results from 66 DCTs were statistically processed.

Fig.4 depicts experimental results about operational yield dependency on power-supply voltage. Maximum yield improvement is 55%, and maximum reduction of power consumption is 12%.

The chip has eight DCT circuits and fig.5 demonstrates differences among the eight. The eight lines make four groups of two each. Before adjustment, there is difference of 0.1V between a couple of outermost circuits and a couple of innermost circuits due to IR drop. After adjustment, the difference reduces to 0.07V.

Elapsed adjusting time is 5 minutes, and we predict adjusting time on the LSI tester is 0.5 second based on results of past experiments. This implies that a commercial LSI tester which tests 32 chips simultaneously can adjust the chips in 15ms per a chip.

#### 5. Conclusions

In applying the post-fabrication clock-timing adjustment to test chips based on commercial low-power logic design, additional 12% power reduction was achieved. Therefore, the results demonstrate the adjustment method is practically useful for commercial chips.

### Acknowledgements

The DCT test chip was designed as part of the MIRAI project supported by NEDO. The authors thank Dr. Hirose and Dr. Masuhara in MIRAI Project, and Mr. Ibaraki and Mr. Fukase in SANYO for their encouragement.

# References

- [1] E. Takahashi et al., J. Solid-State Circuits, vol.39, (Apr. 2004) 643-650.
- [2] H. Yamauchi et al., *ISSCC Digest of Technical Papers* (2005) 130-131.
- [3] S. Furuichi et al., Proc. of IEEE Asian Solid-State Circuits Conference (2007) 268-271.
- [4] T. Susa et al., Proc. of the 14th Workshop on Synthesis And System Integration of Mixed Information technologies (2007) 166-173.



Fig.1 Post-Fabrication Clock-Timing Adjustment



Fig.2 Block diagram of the DCT Test Chip



Fig.3 Photomicrograph and eight blocks of the DCT Test Chip



Fig.5 Operational yield dependency on power-supply voltage using the adjustment



(a) Before Adjustment





Table I Design Specifications for the DCT Test Chip

| 200MHz (Worst Case): Design Constration |
|-----------------------------------------|
| 90nm 6-layer CMOS Technology            |
| 5mm x 5mm                               |
| 2.41M gates                             |
| 1.0V (Core blocks), 2.5V (I/O blocks)   |
|                                         |