# A Compact and Power-Efficient Implementation of Rank Order Filters Using Time-Domain Digital Computation Technique

Liem T. Nguyen, Kiyoto Ito and Tadashi Shibata

Department of Frontier Informatics, School of Frontier Sciences, The University of Tokyo 5-1-5 Kashiwanoha, Kashiwa-shi, Chiba 277-8561, Japan Phone: +81-4-7136-3853 Fax: +81-4-7136-3855 Email: {nguyen, kiyoto}@else.k.u-tokyo.ac.jp, shibata@ee.t.u-tokyo.ac.jp

## **1. Introduction**

Rank order filters (ROFs) have been widely applied to various speech and image processing systems due to their capability of suppressing noise without blurring the original signal. For instance, median filters, the most popular and important type of ROFs, are very effective in removing impulsive noise, such as speckle noise, while preserving sharp edges usually found in images. Other types of particular interest are min/max filters, which play essential roles in template matching for pattern recognition.

Typically, ROFs have been implemented as software programs, where processing is carried out on digital data supplied from a separate image sensor chip after A/D conversion. If the focal-plane image processing technology is employed, however, it is possible to implement the filtering functions directly on the image sensor chip. Such an approach would provide the most efficient way of implementing ROFs in terms of compactness, speed performance and power saving.

Several dedicated analog [1][2] and digital [3][4] hardware implementations of ROFs have been proposed. The former, though often compact, suffer from the inherent low accuracy in computation. The latter, on the other hand, provide high accuracy at the expense of chip area and power dissipation, thus not being very suited to employ for focal-plane image processing.

In this paper, a digital implementation of ROFs using the time-domain digital computation technique [5] is presented. The pixel intensity is represented as a pulse width and rank order filtering is carried out using simple digital adders and a binary counter. As a result, both the compactness and power efficiency of analog and the accuracy of digital have been simultaneously achieved. A proof-of-concept chip with 8-bit 16-input was designed and fabricated in a 0.35 $\mu$ m CMOS technology. The circuit has features of low power dissipation (0.44mW at 3.3V) and compact core size (0.014mm<sup>2</sup>) as compared with published digital counterparts in [3][4], while providing sufficient speed (260,000 ranks/s) and accuracy (8-bit).

# 2. Time-Domain Computation Architecture

### Analog-to-Time Conversion

In this architecture, time-domain signals (pulse-width modulated signals) are employed as inputs to the filter. The conversion of analog pixel intensities to time-domain signals is carried out by analog-to-time converters (ATCs), as shown in Fig. 1. A sample voltage is applied to the positive node of the comparator, while the common ramp voltage is given to the negative node. At the beginning of



Fig. 2 The proposed rank order filtering architecture

conversion, the ramp voltage is lower than any sample voltages, which sets the comparator output to high. The ramp voltage is then swept linearly until it reaches the supply voltage. As the ramp voltage crosses the sample voltage, the comparator output changes to low. Namely, the analog voltage is converted to the width of a pulse signal. *Circuit Configuration* 

Fig. 2(a) illustrates the schematic of the proposed rank order filtering architecture. The circuit is composed of a 1-bit 16-input carry save adder (CSA), a 5-bit subtractor and an 8-bit binary counter. First, the output of CSA (SUM) yields the number of inputs at high. The subtraction between SUM and the rank order R is then carried out at the subtractor. While SUM exceeds R, the subtraction is positive, which sets the subtractor carry out (Y) to high. As SUM falls below R, the subtraction becomes negative, thus Y turns to low. Namely, the pulse width signal identical to that of the input corresponding to the rank order is generated at Y. Finally, the binary counter counts up the pulse width of this signal, thus yields the rank order filtered value in a digital format.

A timing chart example of 3rd-order (i.e. R=3) filtering of 5 inputs is shown in Fig. 2(b). While IN[3], the 3rd largest input, is still high, SUM  $\geq$  3, which keeps Y at high. As IN[3] turns to low, SUM < 3, thus Y also turns to low. Namely, Y is a pulse width signal identical to IN[3]. Since Y is connected to the counter enable node (ENBL), the digital value of the pulse width of IN[3] is provided as the counter value when the cycle of operation ends.

## 3. Experimental Results and Discussion

As a proof-of-concept chip, an 8-bit 16-input rank order filter was designed and fabricated in a  $0.35\mu$ m 2P3M CMOS technology. A photomicrograph of the chip is shown in Fig. 3. The block labeled as "address decoder" is an additional functional circuitry which identifies the location of the input corresponding to the rank order. With this function, the chip can perform further advanced operations, like that required in the winner-take-all circuit for pattern classification.

The core size without and with the address decoder is 0.014mm<sup>2</sup> and 0.029mm<sup>2</sup>, respectively, which is about 140 time smaller than published digital counterpart in [3], in which a similar  $0.35\mu$ m technology was employed. The power dissipation excluding and including input/output buffers is 0.44mW and 12mW, respectively, leading to a power efficiency near that of analog implementations in [1]. The maximum operating frequency observed by the measurement is 68MHz with 3.3V power supply. Since it takes 256 clock cycles per rank, the chip performs over 260,000 ranks/s.

Fig. 4 shows the measured waveforms from (a) a logic scope and (b) an oscilloscope. Rank order filtering of a set of 16 inputs (Hex : FF, F0, E0, D0, C0, B0, A0, 90, 80, 70, 60, 50, 40, 30, 20, 10) was measured with rank order of 11 and 8, respectively. Due to the resolution of the logic scope, Fig. 4(a) shows the chip operating at 1.2MHz, while 68MHz operation is demonstrated in Fig. 4(b).

## 4. Conclusion

A digital implementation of ROFs using the time-domain digital computation technique has been developed. The architecture has features of compactness and power efficiency as compared with published digital works, while still presenting sufficient speed and accuracy.

#### Acknowledgements

The VLSI chip in this paper was fabricated in the chip fabrication program of VLSI Design and Education Center

(VDEC), the University of Tokyo in collaboration with Rohm Corporation and Toppan Printing Corporation.



Fig. 3 Chip photomicrograph



Fig. 4 Measured waveforms

#### References

 B. P. Tan and D. M. Wilson, IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, vol.48, no.2, Feb.2001.
J. Poikonen and A. Paasio, IEEE Trans. Circuits and Systems,

vol.51, no.5, May 2004. [3] L. R. Dung and M. C. Lin, IEEE Trans. Consumer Electronics,

[5] L. R. Dung and M. C. Lin, IEEE Trans. Consumer Electronics, vol.50, no.2, May 2004.

[4] V. A. Pedroni, in Proc. 2004 IEEE International Symposium on Circuits and System, vol.2, May 2004.

[5] K. Ito and T. Shibata, in Proc. 2006 IEEE International Symposium on Circuits and Systems, May 2006.