# F-4-3

# Parallel Image Processing LSI Fabricated Using Three-Dimensional Integration Technology

D. Kawae, T. Nakamura, Y. Yamada, T. Morooka, Y. Igarashi, J. C. Shim, H. Kurino and M.Koyanagi

Dept. of Machine Intelligence and Systems Engineering, Tohoku University 01 Aza Aoba, Aramaki, Aoba-ku, Sendai-shi, Miyagi 980-8579, Japan Phone:+81-22-217-6908 Fax:+81-22-217-6907 Mail:sdlab@sd.mech.tohoku.ac.jp

## 1. Introduction

Recently, the demand for very fast image processing LSI with real time operation capability has rapidly increased. However, it is very difficult to achieve such LSI because the eventual image processing speed is limited by converting 2D image data array to 1D image data stream. Then, we have been developing a very fast image processing LSI with real time operation capability by employing 3D LSI technology [1]. In the parallel image processing 3D LSI, 2D image data can be treated as it is since we can use many short interconnections in the vertical direction which dramatically reduce wiring delays. Therefore, a very fast image processing can become possible. In this paper, we describe the concept of a new parallel image processing 3D LSI which allows us a very fast image processing and we demonstrate the basic function of Laplacian image processing 3D chip as an example of parallel image processing 3D LSI's.

### 2. Concept of parallel image processing 3D LSI

In the parallel image processing 3D LSI, one image frame is divided to many processing units which can be operated in parallel. 8 by 8 pixel image sensors, one amplifier and one ADC, a block of data latches and one ALU are accommodated in one processing unit as shown in Fig.1. The signal data are processed sequentially within each processing unit. However, a very fast pipeline scheme is employed in each processing unit in order to accelerate the processing speed. Two stage pipeline operation is performed in this 3D LSI. The first stage includes the image sensors, the amplifier and the ADC. The second stage includes the shift register and the ALU. A number of processing units with such pipeline scheme are arrayed to cover one image frame as shown in Fig.2. Thus, a very high performance can be achieved in the image processing 3D LSI by combining the parallel processing and the high speed pipeline operation.

### 3. Design of Laplacian image processing 3D chip

In order to confirm the basic function of the parallel image processing 3D LSI, we designed the Laplacian image processing 3D chip with edge detection function. In this work, ALU is hard-coded for Laplacian operation using 4 adjacent pixels. The data in the adjacent processing units are required when the Laplacian operation is executed at an objective pixel on the verge of the processing unit. These data are supplied from ADCs of adjacent processing units. As a result, one ALU processes the data from 96 pixels. For storing these pixels' data, we prepared shift registers which reduce the capacity from the data storage of 96 pixels to that of 23 pixels by using temporal locality of processing data. To operate the whole processing units in parallel, processing units are placed into an array as shown in Fig.3, where the position of shaded circle indicates the data input first portion in a processing unit from the ADC to the register and the arrow mark indicates the direction of data scanning within a processing unit. A block diagram of one processing unit is shown in Fig.4.

#### 4. Fabrication of Laplacian image processing 3D chip

Using 3D integration technology, we fabricated the 3D chip with 3 device layers. The design parameter is described in Table I. Fig.5 shows the photomicrograph of each layer in the 3D chip in which the photo detectors were arrayed in the first layer, the registers were formed in the second layer and, ALUs and ADCs were placed in the third layer. Fig.6 shows the photograph of the 3D chip which was stacked under the glass and bonded to the package. Every pad was formed on the backside of the third layer and connected to the connection board through bumps. The test chip has 9 processing units. Hence, one image frame consists of 24 by 24 pixels in the 3D chip. Sample input image and ADC output image are shown in Fig.7. We successfully demonstrated that photo detection circuits and ADCs were operated well, although it still has some process and design issues [2].

#### 5. Conclusions

We proposed a parallel image processing LSI with extremely high image processing speed using 3D integration technology. In the parallel image processing 3D LSI, one image frame is divided into many image processing units and these processing units are operated in parallel. Furthermore, we utilizes the pipeline operation to accelerate the processing speed within a processing unit. The Laplacian image processing 3D chip was designed and fabricated. We could confirm the basic function of our parallel image processing 3D LSI using this chip.

# Acknowledgments

This work was performed in Venture Business Lab., Tohoku Univ., and partly supported by CREST(Core Research for Evolutional Science and Technology) of Japan Science and Technology Corporation(JST) and Association of Super-Advanced Electronics Technologies (ASET).



Fig.1 Configuration of one processing unit in the system.







Fig.3 Array of processing units to form one trame.



#### References

[1] K.W.Lee et al., IEDM tech. Dig. (2000) p. 165

[2] D.Kawae et al., Proc. SASIMI (1998) p. 229.

| Table I C          | Chip specification                         |
|--------------------|--------------------------------------------|
| Technology         | CMOS1.5 $\mu$ m                            |
|                    | 3 device layers in a chip                  |
| Vertical wire      |                                            |
| size               | $26\mu\mathrm{m}\mathrm{x}39\mu\mathrm{m}$ |
| R                  | 20 Ω                                       |
| С                  | 0.1fF                                      |
| Chip size          | 6mm x 6mm / layer                          |
| PU size            | 1.7mm x 1.7mm / layer                      |
| # of PU            | 3 x 3                                      |
| Image size         | 24 x 24 pixel / frame                      |
| Fill factor        | 35%                                        |
| # of vertical wire | 30 bit / PU                                |
| AD conversion      | 4 bit flash type                           |

1st layer: sensor

2nd layer: register 3rd layer: ADC&ALU



Fig.5 Photomicrograph of test chip(before stacking)



Fig.6 Photomicrograph of test chip(after stacking).

Sample input

ADC output



Fig.7 Sample input image and corresponding ADC output.