# Low-Complexity Time-Domain Winner-Take-All Circuit with High Time-Difference Resolution Limited only by With-In-Die Variation

M. Yasuda, T. Ansari, W. Imafuku, A. Kawabata, T. Koide, and H. J. Mattausch

Research Institute for Nanodevice and Bio Systems, Hiroshima University, Higashi-Hiroshima, 739-8527, Japan Phone: +81-82-424-6265 E-mail : yasuda-masahiro@hiroshima-u.ac.jp

### 1. Introduction

Pattern matching is necessary for applications such as image/speech recognition, data compression or artificial intelligence. Nearest distance search between input pattern and reference patterns is the common way of realizing the pattern matching. The most similar reference pattern with the smallest distance to the input pattern is normally called winner pattern or just winner.

Conventional methods for nearest-distance search are often sequential and based on processors [1]. Mixed digital/analog and purely digital fully-parallel solutions [2-4] have also been reported. Here, we present a time-domain Winner-Take-All (WTA) circuit for fully-parallel nearest-Hamming-distance search, based on distance mapping into time-domain with adjustable ring oscillators. This WTA circuit is scalable to small design rules and low supply voltages and has high robustness against fabrication-related variations. The time-domain WTA circuit detects the first arriving ring-oscillator output signal and thus determines the winner. High time-difference resolution is important to distinguish the correct winner signal from the loser signals.

### 2. Time-Domain Nearest-Distance Search

Fig. 1 shows a block diagram of the fully parallel nearest-Hamming-distance search concept in time-domain. This architecture includes search-data storage cells (SC), unit comparators (UC), adjustable ring oscillators for word comparison, a row/column decoder, a read/write circuit and WTA the time-domain circuit. The adjustable ring-oscillators consist of delay stages which allow to select an additional time delay  $\tau_s$  for each non-matching bit (see Fig. 2). Consequently, signals move around each ring oscillator with a delay of  $\tau_0 + n_H \cdot \tau_S$  per loop, where  $n_H$  is the corresponding Hamming distance and  $\tau_0$  is the basic ring-oscillator delay for  $n_{\rm H} = 0$ . The delay  $\tau_{\rm S}$  is a freely adjustable design variable, which can be used to compensate process variations and the time-resolution limit of the time-domain WTA circuit. The time-domain WTA circuit uses the ring-oscillator outputs for winner decision and generates a search-end signal (SE) to turn off all ring-oscillators. A concept for distance mapping into time-domain based on a linear inverter chain with adjustable inverter-driving capability, and thus less delay-size adjustability, has been reported in [5].

## 3. Developed Time-Domain Winner-Take-All Circuit

The developed time-domain WTA circuit, shown in Fig. 3, is based on a winner-detector, edge-triggered resettable registers and adjustable delay stages  $\tau_D$  in the signal path to the registers. After a selectable number of loops, ring-oscillator outputs are connected to the winner-detector and, via identical delay stages  $\tau_D$ , to their corresponding registers. The winner-detector detects the first arriving

ring-oscillator signal and generates a clock signal for all registers with a delay  $\tau_{CLK}$ . The data-signal-path delay of each register is designed as  $\tau_D = \tau_{CLK}$  -  $\tau_{ST}$  -  $\tau_{\Delta var},$  where  $\tau_{ST}$ is the register-set-up time and  $\tau_{\Delta var}$  is a safety margin to compensate relative changes in the different signal-path delays due to process-variation effects. Therefore, if the additional time delay for the signal from loser rows is larger than  $\tau_{\Delta var}$ , only the correct winner row is registered as winner, while all other rows are correctly registered as losers.  $\tau_{\Delta var}$  is the winner-loser time-difference resolution of the WTA circuit and is determined by the variation size of the used process. For an ideal process without variation it would be possible to choose  $\tau_{\Delta var}$  arbitrarily close to 0. The resolution of previously reported time-domain WTA circuits with feedback loop [6] is limited by the loop delay, even without process variation, because the loop delay is not compensated in the input-signal paths. The resolution of previously reported open-loop time-domain WTA circuits [5, 7] is also limited only by process variations, but due to their higher complexity they need a much larger number of transistors. Additionally, because of the longer relevant signal paths [5] or the application of dynamic circuits [7], their susceptibility to process variations is higher.

We investigated two implementations for the winner detector, which is the key circuit for the time-domain WTA. Fig. 4(a) shows a 2-stage wired-OR detector, which has the advantage of a small number of transistors and a signal path with very few internal nodes. Its disadvantage is the dynamic nature, although mitigated by pre-charge & keeper circuits, and the resulting higher susceptibility to process variations. Fig. 4(b) shows an efficient static CMOS implementation with alternating 2-input NOR and NAND stages. In this implementation it is important to eliminate the possibility of different winner-detector delays for different winner locations by eliminating the asymmetric nature of conventional static CMOS gates. Fig. 5 shows the symmetric 2-input NAND gate in this work, compared to the asymmetric conventional NAND gate. The symmetric NOR gate is constructed according to the same principle.

## 4. Implementation Results in 180nm and 65nm CMOS

We have implemented our time-domain WTA architecture for 64 and 128 inputs signals in 180nm and 65nm CMOS technology. The 2-stage wired-OR solution was only designed for 64 inputs in 180nm CMOS. The post-layout simulation results for two different designs with 64 inputs in 180nm CMOS and for designs with 128 inputs in both 180nm and 65nm CMOS are listed in TABLE I. The designed minimum winner-loser time-difference resolutions are determined by the estimated delay differences in the critical signal paths due to variation effects, and are in the range from 103ps to 160ps for the 180nm technology and 48ps for the 65nm technology. These time-resolution results are 70% - 88% better than the

best previously reported results [7]. The designed time-domain WTA circuits have been implemented in complete associative-memory designs for nearest-Hamming-distance search and were verified to be fully functional in the fabricated chips. Profiting from their low complexity, all designed time-domain WTA circuits use only  $\leq$ 3.5% of the total associative memory area.

#### 5. Conclusion

The time-resolution of the reported time-domain WTA circuits is only limited by the within-die variation and not by the feed-back-loop delay as for previously reported circuits [6]. In comparison to previous open-loop time-domain WTA circuits [5, 7] it has much less complexity and thus requires only  $\leq 3.5\%$  of the area of a complete associative memory design. The attained time-resolution results of 103ps - 160ps (180nm CMOS) and 48ps (65nm CMOS) in designs for 64 and 128 input signals are 70% - 88% better than the best previously reported results [7].

#### Acknowledgements

The VLSI chips in this study have been fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with Semiconductor Technology Academic Research Center (STARC), Rohm CO., LTD., Fujitsu Semiconductors LTD, Cadence Designs Systems Inc., Synopsys Inc., Mentor Graphics Co., and Simucad Design Automation Inc.



Fig. 1. Block diagram of time-domain associative memory. SC and UC are abbreviations for storage cell and unit comparator.



Fig. 2. Detailed structure of 1 row of the time-domain associative memory and circuit of the ring-oscillator stage with adjustable delay.

#### References

- [1] Y. Oike, et al., Proc. IEEE CICC (2004), pp. 295-298.
- [2] H.J. Mattausch, et al., IEEE J. Solid-State Circuits, 37, pp. 218-227 (2002).
- [3] M.A. Abedin, et al., Proc. SASIMI (2006), pp. 350-354.
- [4] S. Nakahara, et al., IEEE J. Solid-State Circuits, 40, pp. 276-285 (2005).
- [5] M. Ikeda et al., Proc. ESSCIRC (1998), pp. 464-467
- [6] T. Yamashita, et al., Proc. ISSCC (1993), pp. 236-237
- [7] K. Ito, et al., Jpn. J. Appl. Phys., 41, pp. 2301-2305 (2002)











Standard asymmetric NAND gate
Symmetric NAND gate with equal switching delay for A and B.
Fig. 5. Asymmetric and symmetric static CMOS NAND gates.

TABLE I. Post-layout simulation results of designed time-domain WTA.

| Winner-detector type                                                             | Wired OR        | Static CMOS logic gates |                 |                 |
|----------------------------------------------------------------------------------|-----------------|-------------------------|-----------------|-----------------|
| CMOS Process                                                                     | 180 nm          |                         |                 | 65 nm           |
| Supply voltage (V)                                                               | 1.8             |                         |                 | 1.2             |
| Input number                                                                     | 64              |                         | 128             |                 |
| Minimum designed winner-loser<br>time-difference resolution (ps)                 | 160             | 103                     | 154             | 48              |
| Winner decision time (ps)                                                        | 1350            | 1120                    | 1430            | 570             |
| WTA design area (mm <sup>2</sup> )<br>and % of total associative-<br>memory area | 0.11<br>(3.5 %) | 0.10<br>(2.9 %)         | 0.22<br>(1.8 %) | 0.04<br>(2.5 %) |