# A Hierarchical 512-Kbit SRAM with 8 Read/Write Ports in 130nm CMOS

Seiji Fukae<sup>1</sup>, Nobuhiko Omori<sup>1</sup>, Tetsushi Koide<sup>1</sup>, Hans Jürgen Mattausch<sup>1</sup>, and Tetsuo Hironaka<sup>2</sup>

<sup>1</sup>Research Center for Nanodevices and Systems, Hiroshima University, 1-4-2 Kagamiyama, Higashi-Hiroshima, 739-8527, Japan Phone: +81-824-24-6265, Fax: +81-824-22-7185, E-mail: {fukae,koide,hjm}@sxsys.hiroshima-u.ac.jp
<sup>2</sup>Department of Computer Engineering, Hiroshima City University, 3-4-1 Ozuka-Higashi, Asaminami-ku, Hiroshima 731-3194, Japan

# 1. Introduction

The newest microprocessors, incorporating advanced parallel processing technology, perform fetch of 2-4 instructions, decoding and execution in parallel at clock frequencies of GHz order. For further increased parallelism, they are expected to develop into on-chip multiprocessors, which additionally process independent program threads as tasks in parallel [1]. Since data processing requires the transfer of data from and to memory, a larger access bandwidth between microprocessor and memory is needed to raise processing speed.

Increasing the port number is needed to raise the access bandwidth over the limitation of 1-port memories. The main conventional multi-port memories are based on multi-port storage cells or on multiple 1-port banks with a crossbar-switch. However, these memories cannot satisfy the requirement of high area efficiency and low access-conflict probability, simultaneously.

For a multi-port-cell-based memory, because all ports are implemented in each cell, the area increases with the square of the port number [2]. For a conventional multi-bank memory with a crossbar-switch, we can expect higher area efficiency in comparison with the former case because 1-port storage cells are used, but it is difficult to realize large bank numbers. Therefore, this conventional crossbar-based multi-bank memory has the problem of large access-conflict probability.

# 2. New Architecture with High Area Efficiency and Low Access-Conflict Probability

We have proposed the Hierarchical Multi-port Memory Architecture (HMA) shown in Fig. 1, which promises to solve the problems of the crossbar-based architecture [3]. Our new architecture can suppress the access-conflict probability with large bank numbers and can realize high area efficiency at the same time. An HMA memory has two original circuits, a "1/ N-port convertor" in the first hierarchy level and an "accessconflict-resolve circuit" in the second hierarchy level. The former converts the port number from N ports (bank externally) into 1 port (bank internally), and consists of a "bankenable circuit", an "active-address-connect circuit" and an "ac-tive-data-connect circuit", as shown in Fig. 2. The access-con-flict-resolve circuit grants the access permission to one port when two or more ports carry out accesses to the same bank. The main differences between HMA and the crossbar architecture are that port convertors are located in each bank and that a column/row decoder concept is used, which allows the matrix arrangement of the banks. Fig. 3 shows the area ratio of a bank with 1/N-port convertor and routing over the area of the multi-port cells representing the same storage capacity. Especially, an HMA memory has increasing area efficiency for larger port numbers. For example in the case of 32 ports, the HMA memory can be 95% smaller than a multi-port-cell-based memory. Fig. 4 shows the critical access path of the HMA memory. At first, the access ports submit the bank addresses simultaneously to the access conflict management circuit and the row/column bank selectors in the 2nd hierarchy level as well as the bank-internal addresses and data (write case only) to the banks. Access conflicts are resolved in parallel to the decoding of rows and columns of the accessed banks. The outputs (port blocking signals) from the access conflict resolve circuit are submitted to the last gate of the row/column bank selectors and grant the access permission to or remove it from the port.

Because the access conflict resolve is carried out in parallel to the bank access, the corresponding delay time can be hidden sufficiently. The port blocking signal can also be submitted to the input circuits of the 1/N-port converters, such that more delay time of the access conflict management circuit can be hidden. If the port which accesses a bank obtains the access permission, its bank-internal access address and data are connected to the 1-port memory. Otherwise, the 1 / N-port conversion and the access are inhibited.

# 3. Design Example in 130nm CMOS Technology

An 8-port, 512-Kbit HMA memory, with read/write capability for all 8 ports, is designed with 5 metal layers in a 130nm CMOS technology. We list the set of parameters characterizing the designed HMA memory in Table I. We adopted the Port Importance Hierarchy (PIH) algorithm [4] to resolve access conflicts. The access rejection probability in Table I is for the port with the lowest priority, that is, the port which always loses its access permission when it is involved in an access conflict. Random accesses from each port to the banks are assumed. The realized HMA memory has a very low access-conflict probability of 2.7% due to its large number of 256 banks. The memory is operated in a synchronous mode with the conventional access and precharge method. In particularly, first and second hierarchy levels are activated simultaneously.

Fig. 5 shows the layout of a bank with 8 read/write ports and 2-Kbit bank capacity. The bank-internal decoders are implemented with dynamic CMOS logic circuits. Simple nMOS transfer circuits are used for the write-direction switches in the active-data-connect circuits. Such an implementation would be impossible in the corresponding cross-points of a conventional crossbar-based memory, because the cross-points need to drive long wires between the cross-points and the input buffers of the accessed banks. The 1/8-port convertor occupies about 20% area in a bank. The area of the bank is about 78% smaller than the corresponding 8-port-cell-based memory. Since the bank has many ports and large bank capacity, that is, 8 ports and 2-Kbit bank capacity, the area reduction ratio gets large in comparison with 8-port-cell-based memory.

Fig. 6 shows the chip photo of the fabricated HMA memory with 8 ports and 512-Kbit capacity, whose performance is presently measured. The 2nd hierarchy level of the HMA memory could be kept small, and occupies only about 18 % of the complete design. Because we could use 5 metal layers in this design, we designed global wires in 4th and 5th metal (addresses, data, read/write enable signals) over the banks completely, which contributed to the small size of the design. The simulated access time in analog circuit simulation (HSPICE, based on the schematic entry) for reading and writing is 2.02ns and 1.63ns, as shown in Fig. 7 and Fig. 8, respectively. The time from a left circle to a right circle in each graph means the access time for each event. A 1-port SRAM with the same access bandwidth would require an access time of 250ps.

### Conclusion

We designed 512-Kbit SRAM with 5 metal layers in 130nm CMOS technology. The SRAM has higher area efficiency and low access-conflict probability due to a new bank-based multiport memory architecture, the Hierarchical Multi-port Memory Architecture (HMA). Especially, due to parallel operation of the 8 ports with read/write capability, it achieves very large access bandwidth of about 2GByte/s. Hence, HMA is a very promising candidate for, application of multi-port memories in processors with high parallelism.

#### Acknowledgments

The support of this research is supported by the Semiconductor Technology Academic Research Center (STARC). The VLSI chip in this

study has been fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with STARC, and Dai Nippon Printing Corporation.

#### Reference



Fig. 1: Hierarchical multi-port memory architecture (HMA).





Fig. 4: Critical path of the designed HMA

Fig. 3: N-port SRAM area ratio (HMA / multi-port cell architecture).

Table I: Parameters for the fabricated HMA memory.

memory.

| Technology, Supply voltage   | 130nm CMOS, 1.2V            |
|------------------------------|-----------------------------|
| Port number (all read/write) | 8                           |
| Word length                  | 8-bit                       |
| Bank capacity                | 2048-bit (8-bit × 256-word) |
| Bank number                  | 256                         |
| Total memory capacity        | 2-Kbit                      |
| Access time                  | 2.02ns                      |
| Power dispersion             | 17.2mW@250MHz               |
| Access rejection probability | 2.70%                       |



Fig. 7: Read simulation of the designed 8-port HMA memory. (a) From clock edge to activate a bank. (b) From activating a bank to activate a word line. (c) From activating a word line to output.

- [1] P. P. Gelsinger, *IEEE International Solid-State Circuits Conf. Digest of Tech. Papers (ISSCC)*, pp. 22-25 (2001).
- [2] Y. Tatsumi, et al., IEE Electronics Letters 35, pp. 2185-2187 (1999).
- [3] H. J. Mattausch, IEE Electronics Letters 35, p. 1441-1442 (1997).
- [4] N. Omori, et al., Ext. Abst. of SSDM 2000, pp. 36-37 (2000).



Fig. 2: 1/N-port convertor for N ports, m bit cell address and i bit data word length.



Fig. 5: Layout of a banks with 8 ports and 2Kbit bank capacity in the designed HMA memory.







Fig. 8: Write simulation of the designed 8-port HMA memory. (a) From clock edge to activate a bank. (b) From activating a bank to overwrite a cell.