# Fabrication of a Standby-Power-Free TMR-Based Nonvolatile Memory-in-Logic Circuit Chip with a Spin-Injection Write Scheme

Shoun Matsunaga<sup>1</sup>, Jun Hayakawa<sup>2</sup>, Shoji Ikeda<sup>3</sup>, Katsuya Miura<sup>2</sup>, Tetsuo Endoh<sup>4</sup>, Hideo Ohno<sup>3</sup>, and Takahiro Hanyu<sup>1</sup>

<sup>1</sup> Research Institute of Electrical Communication (RIEC), Tohoku University 2-1-1 Katahira, Aoba-ku, Sendai 980-8577, JAPAN Phone: +81-22-217-5508 E-mail: zhao-yun@ngc.riec.tohoku.ac.jp <sup>2</sup> Hitachi Advanced Research Laboratory 1-280 Higashi-Koigakubo, Kokubunji, Tokyo 185-8601, Japan

<sup>3</sup> Laboratory for Nanoelectronics and Spintronics, RIEC, Tohoku University

2-1-1 Katahira, Aoba-ku, Sendai 980-8577, JAPAN

<sup>4</sup> Center for Interdisciplinary Research, CIR, Tohoku University

Aramaki aza Aoba 6-3, Aoba-ku, Sendai 980-8578, JAPAN

## Abstract

A tunneling-magneto-resistance (TMR)-based memory -in-logic circuit chip with a spin-injection write scheme, where nonvolatile storage functions are distributed over a logic-circuit plane, is fabricated for the first time toward a standby-power-free VLSI with "quick" wake-up/sleep functions without re-load/write-back from/into a off-chip nonvolatile storage system, respectively. Its basic behavior is successfully confirmed and its usefulness is demonstrated in comparison with the conventional approach.

## Summary

Drastic increase of static power dissipation due to leakage current is one of the most serious problems in a recent nano-scaled VLSI chip. One possible solution is to use a nonvolatile memory module put outside the VLSI chip, which completely eliminates the static power dissipation in the VLSI chip, as shown Fig.1. Fig.1(a) shows a typical structure of a standby-power-free VLSI system using a nonvolatile memory module. In order to make the system "slept" (change "active" mode to "standby" mode), stored data inside the VLSI chip is firstly written back into a nonvolatile memory module before shutting down the power supply. Moreover, in order to make the system "waked up" (change "standby" mode to "active" mode), stored data inside the nonvolatile memory module is re-load into the VLSI chip. Therefore, by this conventional method, it takes long time to make the system slept (or waked up) and dissipate large power in proportional to the number of stored data as well as global-wire length between the VLSI chip and the nonvolatile memory module. In order to solve the above problems, it is important to implement a nonvolatile memory-in-logic [1] circuit where nonvolatile storage elements are distributed over a logic-circuit plane as shown in Fig.1(b). As a possible approach to the implementation of a compact memory-in-logic circuit, a ferroelectric-based functional logic gate, where both logic and storage functions are merged compactly, has been reported [2]. However, its storage function cannot be used as an ordinary "register (or latch)" because the number of write/erase endurance cycle of ferroelectric materials is limited to less than  $10^{12}$  times.

For realizing the realistic memory-in-logic circuit with overcoming the above circuit restriction, a tunneling-magneto-resistance (TMR)-based memory-in-logic circuit style, hereafter called "TMR logic," has been proposed [3]. The proposed TMR logic is suitable for a fully parallel VLSI computation such as motion-vector detection used in MPEG encoding, where arithmetic operation between current (external inputs) and reference (stored inputs) pixels is performed simultaneously [4]. Therefore, in this paper, we fabricate a TMR-based full adder chip for the first time with spin-injection TMR devices as a basic arithmetic unit of the above fully parallel VLSI computation by employing a 0.18µm CMOS/TMR process.

Figs.2(a) and (b) show a cross-sectional TEM image and an R-I characteristic of a fabricated TMR device used here consisting of a CoFeB/Ru/CoFeB free-layer, an 1.0nm MgO tunnel barrier, and a CoFeB fixed layer [5]. As shown in Fig.2(c), according to the spin (magnetization) direction of the free layer with respect to that of the fixed layer, the resistance of the device shows two distinct states; low resistance R<sub>P</sub> when the spin directions are parallel and high resistance RAP when anti-parallel, where the MR ratio  $(=100*(R_{AP}-R_{P})/R_{P})$  is about 100 percent. Since the fabricated spin-injection type TMR device has a nonvolatile storage capability, the power supply in TMR logic can be cut off without destroying the stored data, which completely eliminates its static power dissipation. Moreover, a spin-injection TMR device achieves a small cell area and low power operation in comparison with conventional magnetic field-type TMR.

As a design example, Fig.3 shows a circuit diagram of SUM circuit with a one-bit nonvolatile storage capability, where the output, S from SUM function satisfies  $S=A \oplus B \oplus C_i$  ( $\oplus$ : Boolean exclusive OR operator). In order to reduce dynamic power dissipation with maintaining switching speed, the TMR logic circuit is implemented by a combination of a dynamic current-mode logic (DyCML) and TMR devices. It consists of three basic components; a cross-coupled keeper (CCK), a logic-circuit tree and a dynamic current source (DCS). The CCK generates complementary binary outputs in accordance with a magnitude-comparison result between two complementary current signals, where a small current difference can be immediately detected by using the feedback circuit structure of the CCK. The use of the DCS makes it possible to cut off steady current from  $V_{DD}$  to GND, which results in low-power dissipation. In the proposed hardware, TMR

devices are embedded into the SUM circuit, which reduces the transistor counts in comparison with those of the corresponding CMOS circuit realization. Its dynamic power dissipation can be also reduced greatly, because of less active-device counts. Before performing the arithmetic operation in the TMR logic-circuit chip, every stored input is previously programmed. Two access transistors are put as a peripheral data-write control circuit for programming a single TMR device. In every TMR device on the same column of the logic-circuit module array, a pair of dual-rail bit-lines, BL and BL', can be shared. While the logic operation is performed in the TMR logic circuit, all the input voltage signals applied to the word lines are turned off. Fig.3(b) shows a photomicrograph of the proposed SUM circuit with a 0.18µm CMOS/TMR process. The effective area of SUM circuit is about 166µm<sup>2</sup>.

Fig.4 shows the measured waveforms of the TMR-based SUM-circuit test chip. In this measurement, the stored inputs, B and B', are fixed to be '0' (an anti-parallel spin-direction state of the TMR device) and '1' (a parallel spin-direction state), respectively, and the periodic 1.0V-peak-to-peak voltage signals are applied to CLK, CLK', A, A', C<sub>i</sub> and C<sub>i</sub>', respectively, under  $V_{DD}$ =1.0V. In this measurement, it is clearly demonstrated that the output S<sub>after</sub> (S right after power-on) is the same as S<sub>before</sub> (S just before power-off), which means that stored data still remain even if  $V_{DD}$  is shut down and is turned on again. In this way, quick wake-up/sleep behavior of the proposed TMR logic-circuit chip is confirmed without complicated operation such as re-load/write-back from/into a off-chip non-volatile storage device.

In order to demonstrate the usefulness of the proposed circuit, Table 1 summarizes performance evaluation using a HSPICE simulation under a  $0.18\mu$ m CMOS/TMR process. In conventional hardware structure, that consists of CMOS logic gates, SRAM and MRAM, it takes 102 nsec/bit as well as 40 pJ/bit at "sleep" mode and also takes 42 nsec/bit as well as 19 pJ/bit at "wake-up" mode. In contrast, there are no delay as well as no energy consumption in the proposed circuit because stored data has been already memorized into TMR-based nonvolatile devices. Moreover, dynamic power dissipation of the proposed circuit is reduced to 23 percent compared to that of the conventional one, because the use of the TMR logic-circuit structure makes it possible to reduce the number of current paths from V<sub>DD</sub> to GND.

### Acknowledgements

This work was supported by Research and Development for Next-Generation Information Technology by the Ministry of Education, Culture, Sports, Science and Technology of Japan. The authors wish to thank Kenchi Ito of Hitachi, Japan, and Yuzo Ohno and Atsushi Matsumoto of Tohoku University, Japan for help in fabrication of chip and discussion. This work was also supported by VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Synopsys, Inc. and Cadence Design Systems, Inc.



Fig.1: Standby-power-free VLSI system (a) Using a nonvolatile memory module as a backup storage device, (b) Proposed TMR-based nonvolatile memory-in-logic circuit.

#### References

- [1] W. H. Kautz, IEEE Trans. Computers, 18, 8, 719/727, Aug. 1969.
- [2] H. Kimura, et. al., *IEEE JSSC*, 39, 6, 919/926, Jun. 2004.
- [3] A. Mochizuki, et. al., *IEICE*, E88-A, 6, 1408/1415, Jun. 2004.
  [4] M. Hariyama, et. al., *IEICE*, E84-C, 3, 382/389, Mar. 2001.
- [5] S. Ikeda, et. al., *IEEE TED*, 54, 5, 991/1002, May 2007.
- [6] T. Kawahara, et. al., IEEE ISSCC, 480/481, Feb. 2007.



Fig.2: TMR device (a) Cross-sectional TEM image, (b) R-I characteristic, (c) Operation mechanism of a spin-injection type TMR , (d)Symbol.



Fig.3: TMR-based SUM circuit with a one-bit stored input (a)Circuit diagram, (b)Photomicrograph of a fabricated test chip.



Fig.4: Measured waveforms of a TMR based SUM circuit chip.

Table 1: Comparison of full adders including SUM circuit.

| 10010 1 00                              | mparison of fai | i addetto merading o                                                           | om onoun.  |
|-----------------------------------------|-----------------|--------------------------------------------------------------------------------|------------|
|                                         |                 | CMOS/SRAM/MRAM                                                                 | Proposed   |
| Active                                  | Delay           | 224 psec                                                                       | 219 psec   |
|                                         | Power@500MHz    | 71.1 μW                                                                        | 16.3 μW    |
| Standby                                 | Power           | 0 µW                                                                           | 0 μW       |
| Write<br>(Data update)                  | Delay           | 2 nsec/bit                                                                     | 3 nsec/bit |
|                                         | Energy          | 4 pJ/bit                                                                       | 4 pJ/bit   |
| Sleep *1)<br>Active<br>to<br>Standby    | Delay           | 102 nsec/bit<br>2nsec@SRAM cell read<br>100nsec@MRAM cell write <sup>[6]</sup> | 0 nsec/bit |
|                                         | Energy          | 40 pJ/bit<br>4pJ@SRAM cell read<br>36pJ@MRAM cell write [6]                    | 0 pJ/bit   |
| Wake-up *1)<br>(Standby<br>to<br>Active | Delay           | 42 nsec/bit<br>(40nsec@MRAM cell read [6]<br>2nsec@SRAM cell write             | 0 nsec/bit |
|                                         | Energy          | 19 pJ/bit<br>(15pJ@MRAM cell read <sup>[6]</sup><br>4pJ@SRAM cell write        | 0 pJ/bit   |

\*1) Delay and energy estimation do not include the effect due to power-line on/off and data transfer between a CMOS-SRAM chip and an MRAM chip.