# A Bipolar 32bit Parallel Multiplier-Accumulator Using a 107K-Component VLSI Macrocell

Masao SUZUKI and Michihiro HIRATA

NTT Electrical Communications Laboratories 3-1, Wakamiya, Morinosato, Atsugi, Kanagawa, Japan

A 14.5 ns 32bit bipolar parallel multiplier-accumulator has been developed with 7.5 W/chip power using a novel 107K-component VLSI macrocell. A basic macrocell is composed of 16 transistors and 18 resistors. To achieve high-speed multiplication time, the VLSI based on a modified Booth's algorithm is designed using a Wallace tree, a 2-stage 67bit carry-lookahead adder, and pipeline registers.

#### 1. Introduction

Si bipolar VLSI technology already brought about a 20.4 ns 10.7K-gate 32bit parallel multiplier with 12.2 W on a masterslice chip.[1]

This paper describes a higher speed and complexity 32bit parallel multiplier-accumulator built on a newly developed 107K-component VLSI macrocell. Utilization of sophisticated macrocell circuit design and 1.5  $\mu m$  rule oxide isolation process technologies achieves an equivalent 14.9K gates obtained using 87 % of the 3000 internal cells. As measured results, 14.5 ns multiplication cycle time and 7.5 W/chip power dissipation are achieved with ECL10K compatibility.

#### 2. VLSI Macrocell Array

#### A. Macrocell Configuration

To achieve a highly functional macrocell, a new cell structure having more components and wiring channels than that of a previous macrocell [2] is designed. The component schematic of the resulting internal cell is shown in Fig. 1. Each cell contains 16 transistors and 18 resistors. The component count per cell is set so as to implement the latch function with a set, a reset, and gated clocks in each cell, as shown in Fig. 2.

To achieve high wirability, many wiring channels are prepared within a cell. The 27 1st metal channels lie horizontally, while the 36 2nd metal channels lie vertically. Twelve of the 1st and 8 of the 2nd metal channels are used for power and reference busses. The 3rd metal layer is used only for power busses. The wiring pitches of the 1st and 2nd metal channels are both 5  $\mu m$ . Cell size is 150 x 180  $\mu m$ .









## B. Process

To obtain a good yield for a high complex chip, a well-practical process is applied. Top and crosssectional views of a basic transistor are shown in Fig. 3. The main features of the fabrication process are as follows:

1) a 1.5  $\mu m$  design rule,

- 2) oxide-isolation,
- 3) walled emitter structure,
- 4) shallow junction, and
- 5) 5  $\mu m$  pitch 3-level metallization.

The main characteristics of an internal transistor are summarized in Table I. Small transistor size of 195  $\mu m^2$  and the walled emitter structure bring about small junction capacitances.

### C. Chip Configuration

The chip configuration is shown in Fig. 4. A matrix of 50 x 60 (= 3000 cells) is placed on a chip. Moreover, 168 input buffer cells, 84 output buffer cells, and 84 reference generators are located on the chip's periphery. The I/O interface is ECL 10K or ECL 100K compatible. The circuit design technique for the VLSI macrocell array is the same as that used previously [2]. However, the operation currents have been optimized for this new device, as shown in Fig. 5. A switching current of 0.22 mA is used for an internal 2-level series gated current switch. Depending on load conditions, the emitter follower current can be set at 0.15 mA, 0.3 mA or 0.45 mA. In total, there are about 50.2K transistors and 56.5K resistors on a 10.8 x 10.8 mm chip.

There are 180 pads for I/O signals and 92 pads for power supply. The chip requires a GND and two supply voltages:  $V_{EE} = -4.5$  V and  $V_{TT} = -2$  V. Signal wiring can be automatically routed using the 900 1st channels and the 1400 2nd channels. If all internal cells and I/O buffers are being used, the chip dissipates maximum 9.5 W.

| and the man of the of t | Table | Π | Experimental | Result | S |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|---|--------------|--------|---|
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------|---|--------------|--------|---|

| Current switch delay   |         | 270 ps    |  |
|------------------------|---------|-----------|--|
| Series-gate delay      | 325 ps  |           |  |
| Fan-in delay           |         | 40 ps/FI  |  |
| Fan-out delay          |         | 15 ps/F0  |  |
| Metal delay $I_{EF} =$ | 0.15 mA | 270 ps/mm |  |
|                        | 0.30 mA | 150 ps/mm |  |
|                        | 0.45 mA | 90 ps/mm  |  |



Fig. 3 Top and cross-sectional views of an internal

Table I Transistor Characteristics

| Emittr size                         | 1.4 x 2 | 2 µm |
|-------------------------------------|---------|------|
| Dase series resistance              | 1465    | ohm  |
| Base-emitter junction capacitance   | 10.9    | fF   |
| Base-collector junction capacitance | 9.3     | fF   |
| Collector-isolation capacitance     | 48.3    | fF   |
| Cutoff frequency $(f_T)$            | 8.0     | GHz  |



Fig. 4 Chip structure of the VLSI macrocell array.



Fig. 5 Circuit schematics for the VLSI macrocell array.

## D. Macrocell Performance

To verify masterslice performance, a test chip, including circuit chains with various loading conditions, was fabricated and tested. The experimental results are summarized in Table II. The basic OR/NOR current switch delay with 1.29 mW power dissipation was 270 ps and resulted in a power-delay product of 0.35 pJ. These results were well agreed with the designed values.

## 3. Multiplier-Accumulator Design

A block diagram of the 32bit multiplieraccumulator LSI is shown in Fig. 6. To minimize multiplication time, a modified Booth's algorithm is employed. A modified Wallace tree for the partial product addition, and a 2-stage 67bit carry-lookahead-adder for final product results are also employed. The LSI provides 3 registers to store all the I/O data; a multiplicand, a multiplier, and a product accumulator-register. This results in easy pipeline data management. All registers are designed to be operated by separate clocks, allowing flexible and precise timing adjustments. In addition to the add and subtract functions, a round function is also available for flexible applications.

To implement this complex function, a total of 20 kinds of macrocells are designed and 1385 circuits are used, as summarized in Table III. In this multiplieraccumulator, high functionality of about 5.6 equivalent gate/cell is achieved owing to the sophisticated circuit

Table III Macrocells used and gate count

|                                      | Nu      | Number of : |      |  |
|--------------------------------------|---------|-------------|------|--|
| Logic Function                       | Circuit | Cell        | Gate |  |
| 1) Input Buffer (Inverter)           | 70      | 70          | 70   |  |
| 2) Output Buffer (Inverter)          | 67      | 67          | 67   |  |
| 3) Dual Carry Save Full Adder        | 257     | 771         | 6168 |  |
| 4) Dual 2 In AND/NAND                | 71      | 71          | 142  |  |
| 5) Dual Buffer                       | 4       | · 4         | 8    |  |
| 6) Dual 2 In NOR                     | 1       | 1           | 2    |  |
| 7) 2:1 Selector/Multiplexer          | 67      | 67          | 268  |  |
| 8) 2 In EX-OR/NOR                    | 134     | 134         | 536  |  |
| 9) 5 In AND/NAND                     | 28      | 28          | 28   |  |
| 10) Half Adder                       | 35      | 35          | 175  |  |
| 11) Dual Booth's Multiplexer (1)     | 225     | 675         | 4275 |  |
| 12) Booth's Multiplexer MSB          | 16      | 32          | 112  |  |
| 13) Dual Booth's Multiplexer LSB     | 15      | 45          | 270  |  |
| 14) Dual Booth's Multiplexer (2)     | 16      | 32          | 256  |  |
| 15) 2 In AND/NAND, 2 In OR/NOR       | 66      | 66          | 132  |  |
| 16) 1-2-3-4 In NAND/AND              | 16      | 32          | 80   |  |
| 17) 1-2-3 ln / 1-2 ln NAND/AND       | 20      | 40          | 140  |  |
| 18) 3 In AND/EX-OR                   | 51      | 102         | 306  |  |
| 19) Quint 1-2-3-4-5 In NAND-AND/NAND | 9       | 27          | 54   |  |
| 20) D-Type Master Slave Flip-Flop    | 200     | 400         | 1600 |  |
| 21) 2 Bit Booth's Decoder            | 15      | 60          | 195  |  |
| 22) Logic 0/1 Generator              | 2       | 2           | 2    |  |

Total





Fig. 6 Block diagram of the 32bit multiplieraccumulator.





design technique. As a designed example, the dual Booth's multiplexer is shown in Fig. 7. For this example, an equivalent gate count of over 6.3 gates/cell (19 gates/3 cells) is achieved with only 4 mW power dissipation. This LSI consists of an equivalent 14.9K gates using 87 % of the 3000 internal cells. Consequently, the VLSI macrocell has a gate count of 17K gates when using all internal cells. A customized chip microphotograph is shown in Fig. 8.

A customization of the VLSI macrocell array has been done automatically using the conventional CAD system. Obtained some results are discussed. The net length distribution except for macrocell implementation is shown in Fig. 9. The total net count is only 2907 and average net length is 2.31 mm because of the macrocell array approach. The total wiring length is about 6.7 m. The occupied channel region is nearly 32 % of the total channels.

## 4. Performance of the Multiplier-Accumulator

In spite of the great complexity of the 14.9K gates, perfect chips have been obtained with 7.5 W power dissipation/chip. A Schmoo plot of critical path test results for  $V_{TT}$  versus  $V_{EE}$  is shown in Fig. 10, adopting a multiplication cycle time as a parameter. The  $V_{TT}$  of -2.0 V and  $V_{EE}$  of -4.5 V are external supply voltages. A wide operating range is obtained for these power supply voltages. The maximum rate of 14.5 ns is achieved when supply voltages are optimized. This delay path through 23 current switch stages corresponds to 38 simple OR/NOR gates. As a result, an equivalent gate performance of 380 ps with 0.5 mW/gate is achieved for the actual VLSI circuits.

## 5. Summary

A 14.5 ns 32bit parallel multiplier-accumulator with 7.5 W/chip power dissipation has been successfully demonstrated using a newly developed VLSI macrocell array. From the results obtained, it is confirmed that the technologies described here are very effective in achieving high-performance and high-complex bipolar VLSI logics.

#### References

- T. Nishimura, et. al., "A Bipolar 18K Gate Variable-Size Cell Masterslice", ISSCC Digest of Tech. Papers, Feb. 1986.
- [2] M. Suzuki, et. al., "A 333ps 800MHz 7K-Gate Bipolar Macrocell Array Employing 4-Level Metallization", *IE*<sup>3</sup> JSSC, vol. SC-19, No. 4, Aug. 1984.



Fig. 8 Microphotograph of the customized chip.



Fig. 9 Distribution of wiring net length.



Fig. 10 Schmoo plot of the critical operation.