# Atom-Switch-based FPGA with Optimized Driving Capability Buffer

X. Bai, M. Miyamura, Y. Tsuji, R. Nebashi, A. Morioka, N. Banno, K. Okamoto, H. Numata, H. Hada, T. Sugibayashi, T. Sakamoto, and M. Tada

NEC Corporation, Miyukigaoka, Tsukuba, Ibaraki 305-8501, Japan E-mail: <u>x-bai@bc.jp.nec.com</u>

### Abstract

Atom-switch FPGA (AS-FPGA) uses a simple crossbar switch structure for signal routing. AS-crossbar switch enables shorter delay thanks to single-stage routing and low capacitance of AS, differently from CMOS multiplexer. However, the output of the crossbar switch has a variety of load capacitance regarding its fan-out when the application circuit is mapped on the AS-FPGA. The buffer driving capability for the crossbar, which is crucial for signal delay, is carefully considered. Our simulation results show that energy-delay product (EDP) of the AS-crossbar switch using optimized driving capability buffers is reduced by 22.3%, compared to the non-optimized one.

#### 1. Introduction

FPGAs have been widely used in IoT systems including cloud applications in the data center and sensor applications in the edge. Large circuit area, signal delay and power consumption of conventional SRAM-based FPGAs (SRAM-FPGAs) limit their integrations into IoT systems especially in the power-limited edge applications. To overcome these issues, a 40-nm atom-switch (AS) FPGA has been proposed and exhibited 2x logic density and 3x power efficiency compared to a commercial 40-nm low-power SRAM-FPGA [1].

In this paper, we investigate the buffer driving capability in the AS-FPGA for better performance. Since the crossbar switch takes various number of fan-out (FO) depending upon the application circuit, its driving capability should be carefully considered. The drivability is investigated in terms of active energy and delay, namely, energy-delay product (EDP). Moreover, we compare the performance of the optimized AS-FPGA with the previous non-optimized AS-FPGA and commercial FPGAs to demonstrate high energy efficiency.

# 2. Architecture of AS-FPGA and its buffer driving capability optimization

Figure 1 shows the architecture of AS-FPGA composed of logic blocks (LBs), routing blocks (RBs) and routing wires. Each LB consists of four look-up tables (LUT) and four Dflip-flops. All the LBs are connected with each other by routing wires and RBs. Optimization of buffer driving capability in the RB can improve data transfer performance in the AS-FPGA.

The AS with a very high off/on resistance ratio is fabricated between metal 4 and metal 5 layers (Fig. 2). Two serially connected ASs contribute to low programming voltage and high off-state reliability [2]. ASs are used to construct the RBs, LUT memories and various control memories. Users can configure ASs to realize desired interconnections in RBs and functions in LUTs.

Figure 3 shows the RBs used in the SRAM-FPGA and the AS-FPGA. We use an 8-to-4 RB as an example. As shown in Fig.3 (a), since the SRAM cell (typically 6 transistors) causes large power and area overhead, the SRAM-based RB utilizes multi-stage multiplexers (MUX) for minimizing the SRAM cell count [3]. The buffer A is coupled to fixed number of MUXs, therefore its load capacitance is fixed. On the other hand, our AS-based RB adopts a simple crossbar switch with single stage routing thanks to extremely small area overhead and capacitance of the AS (~1/10 of CMOS). The load capacitance of the SS S0~S3 (or FO). It is necessary to optimize the driving capability for various FO.

There is a trade-off relationship between delay and active energy per cycle regarding the buffer driving capability. So we investigate EDP for different FO. We perform SPICE on AS crossbar and 40-nm buffer. The x1 buffer with minimum driving capability uses the smallest CMOS transistors. Figure 4 shows delay, active energy per cycle and EDP for different buffer size. The EDP is minimized when the buffer size is x12 for all FO of 1, 2, 3 and 4, and reduced by 22.3% compared to the previous AS crossbar using x4 buffers [1].

## 3. Performance evaluation

We evaluate the performance of the AS-FPGA with optimized buffer by mapping an application circuit of arithmetic logic unit (ALU) [4]. To evaluate the performance, we use inhouse static timing analysis tool (STA). Table 1 summaries the performance comparison with the previous non-optimized AS-FPGA and commercial FPGAs. Measured EDP of the previous AS-FPGA is 51%, 17% and 39% of a 40-nm SRAM-FPGA, a 55-nm SRAM-FPGA and a 65-nm FLASH-FPGA, respectively. After optimization of buffer driving capability, AS-FPGA performs 4% EDP reduction by using STA.

### 4. Conclusions

Buffer driving capability of a routing block in an atomswitch-based FPGA is optimized to minimize energy-delay product. The optimized buffer will be used in the next-generation 28-nm atom-switch-based FPGA design.

#### Acknowledgements

A part of this work was supported by NEDO. A part of the device processing was operated by AIST, Japan.

#### References

- [1] X. Bai et al., VLSI Tech., pp. 28-29 (2017).
- [2] M. Tada, et al., IEDM, pp. 689 (2011).
- [3] J. H. Anderson, et al., TVLSI, 17, 8, p1048 (2009).
- [4] https://opencores.org/project/openmsp430/



conventional SRAM-FPGA.

\*: I/O pad delay is not included. \*\*: Clock tree energy and static energy are not included.