Extended Abstracts of the 22nd (1990 International) Conference on Solid State Devices and Materials, Sendai, 1990, pp. 569-572

# A 1GIPS Josephson Data Processor

### Yuji HATANO, Shin'ichirou YANO, Hiroyuki MORI, Hiroji YAMADA, Kouji NAKAHARA, Mikio HIRANO, and Ushio KAWABE Central Research Laboratory, Hitachi Ltd., Higashi-Koigakubo 1-280, Kokubunji, Tokyo, 185, Japan

Abstract A 4-bit Josephson data processor with 16-instruction set, 192b register file, and 1Kb-external-RAM access capability has been designed, fabricated, and tested. Each instruction is treated by a three-stage pipeline of instruction fetch, data fetch, and decode/execute. The chip is operable under a 1-GHz clock, and it has a peak performance of 1 GIPS. The 7 mm square chip includes 3665 gates, 14960 Josephson junctions, and 15462 resistors. The fabrication process is  $2.5-\mu$  m-rule Nb/AlOx/Nb. The AC power is utilized with both polarity in each of the 4 blocks, thus realizing an 8-fold serial power supply. Power consumption is 40mW. Function tests showed successful operations. Part of the processor operated at 1 GHz.

### I. INTRODUCTION

described.

Many Josephson LSIs[1]-[6] have been fabricated using the Nb/AlOx/Nb technology. These are 8b DSP[1], 7.6K gate macrocell array[2],

4Kb RAM[3]-[4], 4-chip 4b microcomputer[5]. The authors have reported on a 4b data processor with 8-instruction set[6].

Although two-phase[4] or three-phase[1] AC drive is used in recent Josephson processors, we have adopted single-phase AC in favor of its relative ease in the powering system outside the chip. The single phase AC drive requires magnetically coupled gates that have more complex fabrication process than direct coupled gates.

In this paper, a 4-bit 1 GIPS data processor with 16-instruction set and 1Kbexternal-RAM access capability will be reported. Fabrication process for the Josephson magnetically coupled gates will be reviwed in short, first. After that, the design and operation result of the processor will be



Fig.1 Cross section of the Josephson device. Table 1 Layer construction of the processor.

| ľ | Thickness |            |             |        |                                  |  |
|---|-----------|------------|-------------|--------|----------------------------------|--|
|   | Layer     | Material   | Patterning  | (nm)   | Function                         |  |
|   | M1        | Nb         | RIE*        | 250    | Ground plane                     |  |
|   | lla       | SiO        | Lift off    | 220    | Ground plane isolation           |  |
|   | l1b       | SiO        | Lift off    | 160    | Ground plane isolation           |  |
|   | R         | MoNx       | Ion Milling | 98     | Resistor                         |  |
|   | l1c       | SiO        | Lift off    | 320    | Resistor isolation               |  |
|   | M2        | Nb         | RIE         | 160    | Base electrode, first wiring     |  |
|   | Barri     | er Al,AlOx | RIE         | 6.8    | Tunnel barrier                   |  |
|   | МЗ        | Nb         | RIE         | 80     | Counter electrode                |  |
|   | l2a       | Si         | Lift off    | 160    | Base electrode isolation         |  |
|   | l2b       | Si         | Lift off    | 360    | Base electrode isolation         |  |
|   | МЗс       | PblnAu     | Lift off    | 520    | Counter electrode contact        |  |
|   | 13        | SiO        | Lift off    | 720    | Base, counter elctrode isolation |  |
|   | M4        | PbInAu     | Lift off    | 1,400  | Control electrode, second wiring |  |
|   | 14        | SiO        | Lift off    | 2,000  | Passivation                      |  |
|   | M5        | AuPdTi     | Lift off    | 300    | Under bump metal                 |  |
| _ | M6        | PbBiSn     | Lift off    | 50,000 | Solder bump                      |  |

\* RIE: Reactive Ion Etching

### **II. FABRICATION PROCESS**

A cross-sectional view around the Josephson junction of the flip-flop is shown in Fig.1.  $2.0 \,\mu$  m  $\times 2.5 \,\mu$  m AlOx Josephson junction is constructed between the Nb base and counter electrode. Table 1 indicates the layer construction of the circuit. 15 masks are required including those to construct solder bumps on the pads as shown in Fig.2, which are indispensable to achieve GHz clock operation of the processor.

An SEM photograph of the 2-input OR·AND gate cell is shown in Fig.3. It is the wired connection of two three-junction interferometers [7] and occupies an area of 115  $\mu$ m × 95  $\mu$ m.

## III. CONSTRUCTION OF THE PROCESSOR

A photomicrograph of the processor is shown in Fig.4. The block diagram of the processor is shown in Fig.5. The features are shown in Table 2. The 128 bits of the register file work as the instruction cache. The remaining 64 bits contain data. The 256 words × 4 bits external memory is addressed by an 8 bit program counter (PC1 and PC2).

The 16 × 30 bits ROM decodes the instructions. The 16-instructions consist of add, subtract, multiply, divide, shift, load, store, program transfer from the external RAM to the register file, and branch on condition. All instructions, excluding the branch on condition and load from the external RAM, are treated by a three-stage pipeline of instruction fetch, data fetch, and decode/execute. Load from the external RAM requires another cycle.

All circuits in the processor except DC flip-flops are driven by single-phase AC power. In order to reduce the crosstalk caused by AC power to I/O lines and ease up high-frequency operation of the processor, an 8-fold stacked AC supply has been introduced.



Fig.2 SEM phtograph of a solder bump on the pad.



Fig.3 SEM phtograph of a 2-input OR AND gate.



Fig.4 Photomicrograph of the processor(7mm×7mm).

The AC-driven circuit is divided into four blocks to form functional groups as shown in Fig. 6. These blocks are separated by a ground plane trench 5  $\mu$ m wide and powered in series. Furthermore, equal numbers of gates are distributed to the positive and negative poles of the AC power. As a result, the AC gates are powered in 8-fold in total. This block structire was selected to minimize the crossings over the ground plane trenches. Where a long path to other blocks is unavoidable, buffer gates are inserted to separate the signal path.

The amplitude of the AC power current is 200 mA at the maximum duty condition of the regulator. The power consumption of the whole chip is 40 mW under this condition, 6.2 mW of which is consumed by the DC circuit, and 18 mW by the regulator.

Under normal operating conditions with the regulator drive, the average gate delay is 20 ps /stage. The propagation delay of the transmission line is 9.0 ps/pm. One fan-out corresponds to every 165  $\mu$ m increase of the transmission line length. The critical path of about 700ps appears during the parallel multiplication operation.

A finite AC power transition time  $\tau$  must also be included in the cycle time to avoid punchthrough probability. Using the formula (13) [8],  $\tau = 300$  ps is required to suppress the punchthrough probability below  $10^{-17}$ . Considering the punchthrough probability, and selecting  $\tau$  as 300 ps, the critical path delay of 700 ps can be allowed at the 1 GHz clock frequency. When the pipeline is operating smoothly, one instruction per clock cycle is treated. Thus, the peak performance is 1 GIPS.

### **IV. MEASUREMENT RESULTS**

The Multiply operation of the ALU is shown in Fig.7. In the operation A  $\times$  B, A is fixed at '1101' ('13' in the decimal system), while B





Table.2 Features of the processor.

| Instructions    | 16          |
|-----------------|-------------|
| ROM             | 480b        |
| Register File   | 16×4b×3     |
| Gates           | 3665        |
| Process         | Nb/AIOx/Nb  |
| Design Rule     | 2.5um       |
| Chip Size       | 7.0mm×7.0mm |
| Clock Rate      | 1GHz        |
| Power Dissipati | on 40mW     |
| I/O pins        | 88          |





circulates through  $'0' \sim '15'$  by the internal counter. The waveform is the distinctive DC NRZ pulses. A normal operation result is obtained.

The operating waveform of the external RAM access is shown in Fig.8. The data from the processor is set to '1' and '0' alternately. The WRITE (WRITE control signal) is '1' in each period, and the address is '0'. The INDATA from the memory's 1-bit normally changes between '1' and '0'.

High-speed operation of the whole processor chip is difficult at present because of the operating margin degradation due to crosstalk. However, the lower bits of the outputs from the program counter could be identified at frequencies up to 1 GHz as shown in Fig.9.

### V. SUMMARY

A 4-bit data processor with a 16instruction set, and 1Kb-external-RAM access capability has been designed, fabricated, and tested. Function tests showed successful operations. Part of the processor operated at 1 GHz.

This work was performed as a part of the Large Scale Project of AIST/MITI "R&D of Scientific Computing System" sponsored by NEDO (New Energy and Industrial Technology Development Organization).

### REFERENCES

[1]S.Kotani;Dig.Tech. ISSCC'90, 148.
[2]S.Kotani;Dig.Tech. Symp. VLSI'90, 69.
[3]H.Suzuki;IEEE Trans. Magn., 25, 2(1989), 737.
[4]S.Tahara;IECE Tech.Rep.,ICD90-51(1990), 55.
[5]H.Nakagawa;IECE Tech.Rep.,SCE89-59(1990) 43.
[6]Y.Hatano;Dig.Tech. ISSCC'89, 234.
[7]T.R.Gheewala; IEEE J.Sol.Sta.Cir.14,(1979)783
[8]E.P.Harris; IEEE Trans. Magn., 17,1(1981) 603



Fig.7 Multiply operation of the ALU.



Fig.8 Operating waveforms of the RAM interface.



Fig.9 1 GHz operation of the program counter.