# **Digital Low-Power Real-Time Video Segmentation by Region Growing**

Takashi Morimoto, Osamu Kiriyama, Hidekazu Adachi, Tetsushi Koide, and Hans Jürgen Mattausch

Research Center for Nanodevices and Systems, Hiroshima University, 1-4-2 Kagamiyama, Higashi-Hiroshima, 739-8527, Japan Phone: +81-82-424-6265 Fax: +81-82-422-7185 email: {morimoto, kiriyama, adachi, koide, hjm}@sxsys.hiroshima-u.ac.jp

## 1. Introduction

For object-based image processing such as in the MPEG-7 standard [1] or object-oriented image compression [2], an object-extraction process from natural complex video pictures, called *video segmentation*, is an indispensable first-step. However, visual data has generally high complexity and contains a multitude of information, so that it's difficult to achieve low power and real-time segmentation (33msec/frame) with general purpose hardware like FPGAs, microprocessors or digital signal processors (DSP). Therefore, a special purpose hardware for real-time low-power segmentation is required.

Previously [3, 4], we proposed a high-quality real-time digital architecture for VGA-size video segmentation as well as a *boundary-active-only (BAO)* region-growing concept. BAO is expected to result in an efficient power-reduction technique for region-growing-based segmentation without sacrificing realtime processing. This paper presents the circuit details for BAOimplementation and a BAO-performance evaluation with a fullcustom designed CMOS test-chip in 0.35µm technology, including a 41×33 pixel-processing array. The measured segmentation results for 41×33-sized images were 23µsec segmentation time (avg.) and 45.8mW power-dissipation (avg.) at 10MHz clock frequency.

## 2. Cell-Network-Based Video Segmentation Architecture

Our video segmentation architecture [3, 4] uses a region growing algorithm. Inclusion of a pixel *i* in a given segment is decided by examination of the pixel's connection-weights  $W_{ij}$ with neighboring pixels *j*, which are already included in the grown region. For gray-scale images, the connection-weights  $W_{ij}$  are functions of the luminance differences  $|I_i - I_j|$ . For color images, the connection-weights are functions of the maximum of the differences between each of the three RGB components. The details of the algorithm are as follows. First, in the initialization phase, the connection-weights are calculated. Then leader pixels (self-excitable pixels), which are the seeds of the subsequent region-growing process, are determined from calculated connection-weights. Second, in the segmentation phase by region growing, one of the leader pixels is self-excited and a new region is grown from this leader pixel. In each growing step of the region, excitable pixels are determined with a threshold condition for the sum of connection-weights with excited neighbors, and the pixels fullfilling the excitation condition are automatically excited. The growing steps are repeated as long as excitable pixels exist. When no further excitable pixels are left, the growing process of the respective segment finishes and the excited pixels, constituting the new segment, are labeled and inhibited. Afterwards, the next leader pixel is searched by a token passing process and the next segment is grown from this leader pixel. The segmentation process is completed when all initially determined leader pixels are inhibited.

Figure 1 shows the construction of the cell-network, which is the core circuit of our video segmentation architecture. It consists of cells, which are processing elements and correspond to the pixels, as well as connection-weight registers, which store the connection-weights. Each cell calculates the sum of the connection-weights with excited neighbors and determines its own new state (self-excited, excited, inhibited, labeled) according to the threshold condition. Due to this parallel processing of all cells, the power-dissipation increases in proportion to the number of cells (pixels). To avoid this increase, we propose the boundary-active-only (BAO) concept.

#### 3. BAO Concept and Implementation

The BAO concept, which exploits the characteristics of the region-growing algorithm, is explained with Fig. 2. Due to the stepwise growth of each region, it is sufficient to activate only the cells which have an excitation possibility in the current growth step. Such cells must belong to the boundary of the currently grown region. More specifically, cells with an excitation possibility should not satisfy any of the following three conditions: (1) It is already excited  $(x_{ij}=1)$ . (2) It has already a segment number  $(l_{ij}=1)$ . (3) It is not excited and has no segment number, but there are no neighboring cells excited during the previous clock cycle *t*. In particular, condition (3) means that only a part of the complete boundary of the grown segment has an excitation possibility in the normal case (see Fig. 2).

We implemented a BAO controller in each network cell, which realizes the BAO concept for reduced power dissipation by examining the above 3 conditions and controls the cell's stand-by mode by a clock-gating signal *cell\_CLK*<sub>ij</sub> (see Figs. 3,4). Since the cell-network has long global clock lines with large capacitances, we additionally restrict clock distribution to potentially active network cells by the method explained in Fig. 5. Detection of the rows including region-boundary cells, which have been activated in the previous clock cycle, is carried out with an OR-function of the state signals of the cells. This detection function is no additional overhead because it is required anyhow for recognizing the region-growing end. The clock controller of Fig. 5 distributes the clock signal in the next clock cycle only to rows including cells, which have been excited in the previous clock cycle, and their neighbor rows.

## 4. Test-Chip Design and Performance Measurements

We designed and fabricated a video segmentation test-chip which implements a cell-network with the described BAO architecture in a 0.35µm 2-Poly 3-Metal CMOS technology. Figure 5 shows the die photo of the fabricated chip including a cell-network for  $41 \times 33$  (1,353) pixels on an area of 51.1 mm<sup>2</sup>. The integration density achieved in the full-custom design is 26.5pixel/mm<sup>2</sup>, which is two times better than a standard cell based implementation. Measured power dissipation for a worstcase input image (only one homogeneous region) is 94.0mW at 10MHz (0.069mW/pixel) in the segmentation phase. The worst-case power dissipation of a previously designed  $10 \times 10$ cell-network without BAO [3], which has a twelve-times smaller cell number, is 30.9mW at 10MHz (0.309mW/pixel). Therefore, about 78% power-reduction per pixel have been achieved with the BAO concept. Average power dissipation, estimated with a 7 segment input image, is 45.8mW. Estimated segmentation time and Si-area consumption with BAO-architecture for QVGA-size images are <250msec at 10MHz (Fig. 7) and <120mm<sup>2</sup> in a 90nm CMOS technology (Fig. 8), respectively. The characteristic data of the test-chip are summarized in Table I.

#### 5. Conclusions

We designed and fabricated a cell-network with  $41 \times 33$  cells in 0.35µm CMOS technology for low-power video segmentation and experimentally confirmed the effectiveness of the proposed boundary-active-only (BAO) architecture. Compared with our previously proposed segmentation architecture without BAO [3], about 78% power reduction per cell is achieved

# at 10MHz clock frequency.

Applying additionally a pipeline processing with tiled images [4], VGA-size video-segmentation is expected to become possible with this  $41 \times 33$  cell-network. The segmentation performance for VGA size input images is estimated as 7.49msec segmentation time at 10MHz clock frequency and < 94.0mW power dissipation.

# Acknowledgments

The test-chips in this study have been fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), the University of



Figure 1: Block diagram of the cell-network, implemented by alternately laying cells and connection-weight-register blocks.

| cell /                   | new excited region                   |               | ated cells<br>ock cycle t+2 |
|--------------------------|--------------------------------------|---------------|-----------------------------|
| 0000                     | ′at clock cycle t+1<br>○ ○ ○ ○ ○ ○ ○ | 000/00        |                             |
| $\circ \circ \circ \phi$ | 0 0 0 0 0 0                          | 0 0 • • •     | • • • • •                   |
| 000                      | 0 0 0 0 0                            | 0 0 • 0 0     | 0 0 0 0 0                   |
| 0 0 0 0                  | 000000                               | 0 0 • 0 0     | 0 0 0 0                     |
| 0 0 0 0                  | 0 0 0 0 0 0                          | 0 0 0 0 0     | 00000                       |
| 0 0 0 0                  | 0 0 0 0 0 0                          | 0 0 0 0 0     | 0 0 0 0 0                   |
| 0000                     | 0 0 0 0 0 0                          | 0 0 0 0 0     | 00000                       |
| 0 0 0 0                  | 0 0 0 0 0 0                          | 0 0 0 0 0     | 0 0 0 0 0                   |
| 0 0 0 0                  | 0 0 0 0 0 0                          | 0 0 0 0 0     | 0 0 0 0 0                   |
| 0 0 0 0                  | 000000                               | 0 0 0 0 0     | 0 0 0 0 0                   |
| $(\mathbf{a})$           | excited region at clock cycle t      | • active mode | ⊖ stand-by mode             |

Figure 2: Conceptual diagram of the proposed boundary-active-only (BAO) scheme. (a) shows the excited and newly excited regions at clock cycle t and t+1, respectively. (b) Only cells, which are neighbors of excited cells at t+1, are activated in clock cycle t+2.



Figure 4: Circuit diagram of the BAO controller in each cell for cellinternal power reduction.



Figure 6: Die photo of the network with BAO including  $41 \times 33$  cells, designed in a  $0.35 \mu m$  3-metal CMOS technology. The layout of cell and connection-weight-register blocks is magnified on the right side.

Tokyo in the collaboration with Rohm Corporation and Toppan Printing Corporation. Part of this work was supported by the 21st Century COE program of the Ministry of Education, Science and Culture, Japanese Government and a Grant-in-Aid for JSPS Fellows, 1650741, 2004.

#### References

state signal

-0

control

signals

threshold

excitable

state  $\phi$ 

comparato

for deciding

state registers (xij, pij, lij, Nij)

control unit

Cell CLKi

Figure 3: Block diagram of the cell ij

local BAO controller

enable

weight

calculation

unit

with BAO controller.

control

signals

row CLK

connection

weights betw 8 adjacent

Ď

cells

- [1] The homepage of the Moving Picture Experts Group 2004,
- URL http://www.chiariglione.org/mpeg/
- [2] T. M. Strat, Proc. of Workshop and Exhibition on MPEG-4, pp.53-57 (2001).
- [3] T. Morimoto, et al., Ext. Abst. of SSDM2002, pp.242-243 (2002).
- [4] T. Morimoto, et al., Ext. Abst. of SSDM2003, pp.146-147 (2003).



Figure 5: Block diagram of BAO exploitation for power-reduction of clock distribution. Rows *i*, containing excited cells in the last clock cycle, are detected ( $ZOR_i=1$ ) from the state signals of the row cells. The clock controller distributes the clock only to rows containing these cells and their nearest neighbor rows {*i*-1, *i*, *i*+1}.



Figure 7: Image segmentation time estimation of weight-parallel architecture for larger image sizes at 10MHz clock frequency. For 320× 240 (QVGA) images, very highspeed image segmentation with <250µsec (ave.) is possible in 0.35µm technology. Figure 8: Chip-size estimation for the weight-parallel architecture at the 90nm technology node with 5 metal layers as a function of the image size.

Table I: Characteristic data of the designed test-chip.

| Technology                   | 0.35µm, 2-Poly 3-Metal CMOS     |  |
|------------------------------|---------------------------------|--|
| Cell Architecture            | Weight-Parallel (high-speed)[3] |  |
| Design Area                  | 6.9mm×7.4mm (41×33 cells)       |  |
| Supply Voltage               | 3.3V                            |  |
| Max Clock Frequency          | 10MHz                           |  |
| Segmentation Time            | 34µsec@10MHz (Worst Case )      |  |
| (41×33 pixels)               |                                 |  |
| Worst Case Power Dissipation | 94.0mW@10MHz (Segmentation)     |  |
| (Measured, 41×33 pixels)     | 192mW@10MHz (Initialize)        |  |
| Pixel Density                | 26.5pixel/mm <sup>2</sup>       |  |