## Compact Multi-Bit Encoder for High Speed Frequency-Mapping Associative Memory

Seiryu Sasaki, Masahiro Yasuda, Akio Kawabata, Tetsushi Koide and Hans Jürgen Mattausch Research Institute for Nanodevice and Bio Systems, Hiroshima University, Kagamiyama 1-5-2, Higashi-Hiroshima, Japan Phone:+81-82-424-6265 E-mail: sasaki-seiryu@hiroshima-u.ac.jp.

## 1. Introduction

Applications involving pattern matching, recognition and learning rely on nearest distance search [1-2]. The associative memory, which implements this functionality of nearest-match detection between an input-data and a number of reference data, is an ideal VLSI hardware solution for such applications. We have developed a scalable and reliable associative-memory architecture utilizing a mapping operation of distances into frequency space for detecting distance differences as signal delays [3]. Distance-frequency mapping is realized with ring oscillators programmable in discrete frequency steps. This circuit implementation enables a decrease of process-induced variation effects by design of the delay steps and by applying frequency dividers at the ring-oscillator outputs. For minimum search-time reduction, an optimization of the ring-oscillator's inverter number for the case of complete matching is indispensable. The reported multi-bit encoding method achieves about 1.7 times improvement for case of 2-bit encoding and further improvements roughly proportional to the bit number. A compact and scalable circuit concept is used to minimize the area penalty of the multi-bit encoder.

# 2. Frequency Mapping Associative Memory

## 2. 1. Minimum Distance Search Process

The dataflow of the matching process in a frequency-mapping associative memory is given in Fig 1. In a preparatory step, the reference patterns are stored in the memory's storage cells. When input data for the matching process is supplied, each unit comparator calculates the distance (in this report the Hamming distance) between the corresponding reference-data bits and the input-data bits. Bit-comparison results are connected to the adjustable ring-oscillator stages where the Hamming distance is converted into frequency domain. A search operation begins by starting all ring-oscillators simultaneously with a search-begin signal (SB). The time domain WTA circuit uses the ring oscillator outputs after frequency division for winner decision and generates a search-end signal (SE) to turn off all ring-oscillators.

## 2.2. Distance Frequency Converter

Distance-frequency conversion is done in the adjustable stages of the ring oscillators, which consist in a straight forward design of 1-bit stages as shown Fig 2. Each 1-bit stage has two paths, one for distance 0 (short path) and the other for distance 1 (long path).

For each non-matching bit a designable fixed time  $\tau_S$  is added to the ring-oscillator delay. Hamming distance  $n_H$  is therefore transformed into frequency domain as  $f(n_H)=[2(\tau_0+n_H\cdot\tau_S)]^{-1}$ , where  $[2\tau_0]^{-1}$  is the basic ring-oscillator frequency for  $n_H=0$ .  $\tau_S$  is a design variable, which is freely adjustable for the purpose of compensating process variations and the time-resolution limits of the time domain Winner-Take-All (WTA) circuit.

#### 3. Bit-Encoding Concept and Multi-Path Delay Stages

Figs. 2 and 3 explain the encoding concept for the cases of 1-bit and 2-bit encoding with 2-path and 3-path delay stages, respectively. With 2-bit encoding the number of delay stages of the ring-oscillator is reduced by a factor 2 in comparison to 1-bit encoding, which means that the basic ring-oscillator delay can also be reduced by approximately 2 times, i.e.  $\tau_0(2\text{-bit}) \sim \tau_0(1\text{-bit})/2$ . In the general case of N-bit encoding, the reduction of the basic ring-oscillator delay increases to about a factor N, which means that the minimum search time is reduced by a factor N. Furthermore, the reduced number of switching inverters translates into a reduction of the required energy per search.

Fig 4 shows the die photos of 2 test chips in 180nm CMOS with 2-path and 3-path delay stages. Table 1 compares the performance of these test chips. The comparison verifies the effectiveness of the encoding method in real designs. Test chip 1 uses 2-path delay stages and has a capacity of 64 words, each consisting of 256 bits. Test chip 2 implements 2-bit encoding with 3-path delay stages and has a capacity of 128 words, each consisting of 512 bits. Normalized to the bit-number per word, test chip 2 achieves a 1.7 times improvement of the minimum search time (and therefore  $\tau_0$ ) over test chip 1 due to implementation of the 2-bit encoding concept.

#### 4. Multi-Bit Encoder for Path Selection

While the concept of multi-path delay stages substantially reduces minimum distance search time and search energy, the additional area of the multi-bit encoder is a concern. In case of 2-path delay stages, the outputs from the Unit Comparators (UCs), which are bit comparators for the Hamming-distance, can be directly used for path selection. However, when we implement multi-path delay stages, an additional multi-bit encoder is needed for selecting the correct path. Here, we report a compact N-bit encoder based on cascaded selectors according to the recursive concept explained in Fig. 5a. The number of additional delay units  $\tau_s$ , and therefore the correct path, is determined by increasing the count of matching bits in the cascade level N by 1 if the comparator output bit M<sub>N</sub> indicates a match (M<sub>N</sub>=1) and by otherwise leaving the count unchanged. The 1<sup>st</sup> stage of the cascaded selector, uses logic gates for counting the number of matching bits as shown in Fig. 5b.

Fig 6 compares the developed selector-based multi-bit encoder to a conventional logic gate implementation for up to 6-bit encoding. An area reduction of about 36% is verified when library-based synthesis is applied. Actually achieved areas in a full-custom design are also shown for comparison.

#### 5. Conclusion

We have developed a multi-bit encoding concept to enable the application of multi-path ring-oscillator stages in high-speed and low power associative memories based on frequency mapping. The effectiveness of this concept was verified with complete associative memory designs in 180nm CMOS, implementing 2-path and 3-path delay stages with 1-bit and 2-bit encoding, respectively. Normalized to the bit-number per word, a minimum search time improvement by a factor 1.7 was achieved for the case of 3-path delay stages. In the case of multi-path encoding with higher bit numbers an area reduction by 36% in comparison to a conventional logic gate implementation is verified for the newly developed selector-based multi-bit encoder.

#### Acknowledgment

VLSI-chip fabrication was supported by VDEC, the University of Tokyo in collaboration with Semiconductor Technology Academic Research Center (STARC), Rohm CO., LTD., Cadence Designs Systems Inc., Synopsys Inc., Mentor Graphics Co.

#### References

- [1] D. R. Tveter, et al., Proc. IEEE CICC (2004), pp. 295-298. [2] A. Ahmadi, et al., Proc. IEEE CIISP (2007), pp. 214-219.
- [3] H. J. Mattausch et al., Proc. ESSCIRC 2010, pp.538-541.







Fig 2. Distance – frequency conversion in the ring-oscillator with 2 path delay stages and 1-bit encoding.



Fig 3. Distance – frequency conversion with 3-path delay stages and 2-bit encoding.

Table 1 Comparison of Test chip.

|                   | Test Chip 1                        | Test Chip 2                       |
|-------------------|------------------------------------|-----------------------------------|
| CMOS Process      | 180 nm (5Al)                       | 180 nm (5Al)                      |
| Vdd Supply (V)    | 1.8 V                              | 1.8 V                             |
| Oeganization      | 64-word, 256-bit                   | 128-word, 512-bit                 |
| 2bit Encoding     | Unused                             | Used                              |
| Delay $\tau_s$    | 770 ps                             | 1070 ps                           |
| Search Time (ns)  | 50 – 245 ns<br>3.05 – 14.95 ps/bit | 60 – 575 ns<br>0.91 – 8.78 ps/bit |
| Power Dissipation | 22.28 – 36.54 mW                   | 41.94 mW                          |
|                   | 1.36 - 2.23 mW/bit                 | 0.64 mW/bit                       |
| Area              | 3.08 mm <sup>2</sup>               | 12.58 mm <sup>2</sup>             |



Fig 6. Area consumption of the developed multi-bit encoder in comparison to a conventional logic gate implementation as a function of bit number.









