# Area-Efficient NanoBridge-based FPGA with Optimized Architecture

X. Bai, T. Sakamoto, Y. Tsuji, M. Miyamura, A. Morioka,

R. Nebashi, M. Tada, N. Banno, K. Okamoto, N. Iguchi, H. Hada, and T. Sugibayashi

System Platform Research Laboratories, NEC Corporation, Miyukigaoka, Tsukuba, Ibaraki 305-8501, Japan

E-mail: x-bai@bc.jp.nec.com

## Abstract

Logic cell architecture on NanoBridge-based FPGA has been investigated in terms of cell area and signal delay. Area-delay product is minimized when the cluster size is 4 and the segment length is 4, because of small area and small capacitance of configuration switch (here we use NanoBridge or NB). 1.34x logic density improvement and 15% energy saving, compared to the conventional NB-FPGA with cluster size=2, are demonstrated by implementing an application of a lightweight block cipher.

## 1. Introduction

Large area, delay and power consumption of FPGA limit their integrations into battery-powered applications. To overcome these issues, NB-FPGA has been proposed and exhibited 30% dynamic power saving and x2.5 faster operation compared to a commercial SRAM-FPGA (Table 1) [1]. In this paper, we investigate the architecture of NB-FPGA for better performances. FPGAs have a high energyefficiency for highly parallel applications such as image processing, cryptography. Therefore, we implement a lightweight block cipher [2] in NB-FPGAs to compare the difference of the architectures since it is suitable for small IoT devices.

## 2. Architecture of NB-FPGA and its optimization

NB, which is fabricated between M4 and M5 layers, is a non-volatile resistive-change switch with a very high OFF/ON resistance ratio (Fig. 1). Two serially connected NBs contribute to low programming voltage and high off-state reliability [3]. In a conventional SRAM-FPGA, CMOS switch composed of an SRAM cell and a pass transistor is used for programmable data routing (Fig. 2). The volatile SRAM cell (typically 6 transistors) causes large static power, standby power and area. The pass transistor with relatively large capacitance causes large dynamic power and delay. In the NB-FPGA, we use the NB as a switch to replace the CMOS switch, resulting in low static power, standby power and area. Smaller capacitance of NB (1/10 of CMOS) leads to low dynamic power and small delay.

Figure 3 shows the architecture of NB-FPGA. Each cell is composed of a logic element (LE), an input multiplexer (IMUX) and a switch multiplexer (SMUX) [1]. Each LE consists of look-up tables (LUTs) and flip-flops. The cluster size (CS) refers to how many LUTs in each LE [4]. The LEs are connected with each other by routing wires, IMUXs and SMUXs. The segment length (SL) refers to how many LEs a routing wire spans before terminating [4]. The IMUX and SMUX are constructed by the NB crossbar switch.

The data-transfer path in NB-FPGA is characterized with a model shown in Fig. 4. Various paths are calculated based on HSPICE simulation of a 65nm CMOS process and path delays are pre-characterized (Table 2). We use 20 largest MCNC benchmark circuits to investigate the effect of the CS and SL on area-delay product, as area can often be traded for delay [5]. Figure 5 shows the area-delay product dependence on CS and SL. The CS is optimized to be 4, which is smaller than that in the conventional SRAM-FPGA [5], whereas the SL is optimized to be 4, which is the same as that of the conventional SRAM-FPGA [4].

## 3. Performance evaluation

Figure 6 shows a die photo of the NB-FPGA with optimal CS and SL. Layouts of logic cells in Fig. 7 show that logic density is improved by 1.34 times. A lightweight block cipher, which fits extremely-small hardware and provides notable performance on embedded software [2], is implemented on the NB-FPGA for performance comparison. Shmoo plot for the optimized NB-FPGA shows a wide operation region (Fig. 8). Figure 9 demonstrates that both the delay and power consumption are reduced by employing the optimal CS. Especially, the power-delay product (or energy) of both the NB-FPGAs with non-optimal CS (= 2) and optimal CS (= 4) is minimum when the supply voltage (V<sub>DD</sub>) is 0.8V (Fig. 9(c)).

Table 3 summaries the properties of NB-FPGAs. The count of used clusters in the optimized NB-FPGA is almost half of that in the non-optimized one [1], showing that the lightweight block cipher is efficiently mapped on the optimized one. Thus, the area of the optimized NB-FPGA is reduced by 25% in comparison with that in the non-optimized NB-FPGA. The delay, power consumption, and energy are reduced by 7%, 9% and 15% for V<sub>DD</sub> of 0.8V, respectively.

## 4. Conclusions

Architecture of NanoBridge-based FPGA is optimized to minimum energy-delay performance. Both high logic density (1.34x) and high energy efficiency (1.18x) are achieved with slightly delay reduction.

## Acknowledgements

A part of this work was performed of the METI R&D Program ("Leading New Technology for Energy and Environment") supported by NEDO.

## References

- [1] M. Miyamura, et al., FPGA Sympo., pp. 236-239 (2015).
- [2] T. Suzaki, et al., SAC, pp.339-354 (2013).
- [3] M. Tada, et al., IEDM, pp. 689 (2011).
- [4] V. Betz, et al., FPGA Sympo., pp. 59-68 (1999).
- [5] E. Ahmed, et al., FPGA Sympo., pp. 3-12 (2000).

Table 1 30% dynamic power saving and x2.5 faster operation in the low-voltage region has been achieved in a NB-FPGA with non-optimal cluster size=2 [1] in comparison with a com-mercial SRAM-EPGA (Application: ALU)

| neiciai SKAM-FFOA. (Application. ALU) |                                            |                         |  |  |  |
|---------------------------------------|--------------------------------------------|-------------------------|--|--|--|
|                                       | NB-FPGA<br>(Non-optimal<br>cluster size=2) | Commercial<br>SRAM-FPGA |  |  |  |
| Switch                                | NanoBridge(NB)                             | Pass Tr.                |  |  |  |
| Process node                          | 65nm                                       | 40nm                    |  |  |  |
| Number of<br>LUTs                     | 8192                                       | 1280                    |  |  |  |
| Max. speed at 0.8v                    | 18.2MHz                                    | 7.1 MHz                 |  |  |  |
| VDDmin at 15<br>MHz                   | 0.73V                                      | 0.94V                   |  |  |  |
| Dynamic power<br>at VDDmin            | 28.0µW/MHz                                 | 39.5µW/MHz              |  |  |  |
| Active power at<br>VDDmin             | 550µW                                      | 630µW                   |  |  |  |



Fig.1 NanoBridge (a)TEM images, (b) ON/ OFF state and (c) ON/OFF resistances [3].

Cu bridge

Cu

1 kΩ\*

500 MΩ\*

TCWIRE

[fF/um]

2

Τ2

NanoBridge: C<sub>SW</sub> [fF], R<sub>ON</sub>

LE

Node\_1

4

Segment length

Buffe

cell

Node\_2

Fig.4 A model of data transfer in NB-FPGA.

8

Node 6





(after next cell)

Node\_5

cell

LE

Node 4

Node\_3



Number of Channel (NC) Segment Length (SL) LE: logic element, IMUX: input multiplexer, SMUX: switch multiplexer Fig.3 Basic architecture of NB-FPGA.

| Table 2 Delay parameters [nsec] for various cluster sizes. |              |       |       |       |  |
|------------------------------------------------------------|--------------|-------|-------|-------|--|
| Node to                                                    | Cluster size |       |       |       |  |
| node                                                       | 2            | 4     | 6     | 8     |  |
| 1to2                                                       | 0.171        | 0.205 | 0.225 | 0.258 |  |

| node          | 2     | 4     | 6     | 8     |
|---------------|-------|-------|-------|-------|
| 1to2          | 0.171 | 0.205 | 0.225 | 0.258 |
| 1to3          | 0.265 | 0.328 | 0.355 | 0.412 |
| 1to4          | 0.268 | 0.327 | 0.356 | 0.414 |
| 5to3,<br>6to1 | 0.277 | 0.321 | 0.337 | 0.377 |
| 5to4          | 0.279 | 0.32  | 0.337 | 0.376 |







Fig.6 Die photo of the optimized NB-FPGA with 20x20 LEs.



Fig.7 Layouts of NB-FPGAs.



|                                    | NB-FPGA<br>(Non-optimal<br>cluster size=2) | NB-FPGA<br>(Optimal<br>cluster size=4) |
|------------------------------------|--------------------------------------------|----------------------------------------|
| Switch                             | NanoBridge                                 | NanoBridge                             |
| Process node                       | 65nm                                       | 65nm                                   |
| Count of used LUTs                 | 469                                        | 469                                    |
| Count of used Clusters             | 241                                        | 121                                    |
| Area                               | 1.35mm <sup>2</sup>                        | 1.01mm <sup>2</sup>                    |
| Delay at 0.8V                      | 30ns                                       | 28ns                                   |
| Power Consumption<br>@ 0.8V, 32MHz | 1.792mW                                    | 1.632mW                                |
| Energy @0.8V                       | 53.76pJ                                    | 45.7pJ                                 |





30 40



(c) Energy vs. Supply voltage. Fig.9 Delay, Power, Energy vs. Supply voltage in NB-FPGAs.