Vision Chips
or
Seeing Silicon

Third Revision

Alireza Moini
March 1997

The Centre for High Performance
Integrated Technologies and Systems
The University of Adelaide
SA 5005, Australia
Tel: 61 8 8303 3403
Fax: 61 8 8303 4360
Email: moini@eleceng.adelaide.edu.au

Department of Electronics Engineering
The University of Adelaide
SA 5005, Australia

Vision Chips or Seeing Silicon

Alireza Moini

The Centre for High Performance Integrated Technologies and Systems
Department of Electrical & Electronics Engineering
The Univ. of Adelaide, SA 5005, Australia
Tel: +61 8 8303 3403, Fax: +61 8 8303 4360
email: moini@eleceng.adelaide.edu.au
WWW: http://www.eleceng.adelaide.edu.au/Personal/moini/

March 1997
# Contents

1 Introduction ........................................... 5  
1.1 Smart sensors ................................... 5  
1.2 Advantages and disadvantages of vision chips ........... 6  
1.3 Challenges ........................................ 7  
1.4 Technology ....................................... 8  
1.4.1 CMOS ......................................... 8  
1.4.2 BiCMOS ...................................... 9  
1.4.3 CCD and CMOS/CCD ............................ 9  
1.4.4 GaAs MESFET and HEMT ....................... 10  
1.5 Major groups working on vision chips ................... 10  
1.6 How vision chips are presented in this report .......... 12  

Acknowledgements ........................................ 14  

2 Spatial Vision Chips .................................. 16  
2.1 Introduction ..................................... 16  
2.2 Mahowald and Mead’s silicon retina .................... 18  
2.3 Mead’s adaptive retina ............................. 19  
2.4 Mahowald and Delbrück’s stereo matching chips ........ 20  
2.5 Bernard et al.’s Boolean artificial retina .............. 22  
2.6 Andreou and Boahen’s silicon retina .................. 23  
2.7 Kobayashi et al.’s image Gaussian filter ............... 24  
2.8 PASIC sensor from Linköping University ............... 26  
2.9 MAPP2200 sensor from IVP ........................ 28  
2.10 Forchheimer-Aström’s NSIP sensor ................... 28  
2.11 Sandini et al.’s foveated CCD chip .................. 29  
2.12 IMEC-IBIDEM’s foveated CMOS chip ................. 31  
2.13 Wodnicki et al.’s foveated CMOS sensor .............. 32  
2.14 Standley’s orientation detection chip ............... 33  
2.15 Harris et al.’s Resistive Fuse Vision Chip ............. 35  
2.16 DeWeerth’s Localization and Centroid Computation Chip 38  
2.17 Ward & Syrzycki’s Receptive Field Sensors .......... 40  
2.18 Wu & Chiu’s 2D Silicon Retina ..................... 42  
2.19 Nilson et al.’s Shunting Inhibition Vision Chip ........ 43  
2.20 Keast & Sodini’s CCD/CMOS Imager and Processor ...... 44  
2.21 Mitsubishi Electric’s CMOS Artificial Retina with VSP 46  
2.22 Venier et al.’s Solar Illumination Monitoring Chip .... 47
3 Spatio-Temporal Vision Chips

<p>| | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>3.1</td>
<td>Introduction</td>
</tr>
<tr>
<td>3.2</td>
<td>Lyon’s eye</td>
</tr>
<tr>
<td>3.3</td>
<td>Tanner and Mead’s correlating motion detection chip</td>
</tr>
<tr>
<td>3.4</td>
<td>Tanner and Mead’s optic flow motion detection chip</td>
</tr>
<tr>
<td>3.5</td>
<td>Moore and Koch’s multiplicative motion detector</td>
</tr>
<tr>
<td>3.6</td>
<td>Bair and Koch’s motion detection chip</td>
</tr>
<tr>
<td>3.7</td>
<td>Delbrück’s focusing chip</td>
</tr>
<tr>
<td>3.8</td>
<td>Delbrück’s velocity tuned motion sensor</td>
</tr>
<tr>
<td>3.9</td>
<td>Meitzler et al.’s sampled-data motion chip</td>
</tr>
<tr>
<td>3.10</td>
<td>Moini et al.’s insect vision-based motion detection chip</td>
</tr>
<tr>
<td>3.11</td>
<td>Moini et al.’s second insect vision-based motion detection chip</td>
</tr>
<tr>
<td>3.12</td>
<td>Dron’s multi-scale veto CCD motion sensor</td>
</tr>
<tr>
<td>3.13</td>
<td>Horiuchi et al.’s delay line-based motion detection chip</td>
</tr>
<tr>
<td>3.14</td>
<td>Chong et al.’s change detector</td>
</tr>
<tr>
<td>3.15</td>
<td>Gottardi and Yang’s CCD/CMOS motion sensor</td>
</tr>
<tr>
<td>3.16</td>
<td>Kramer et al.’s velocity sensor</td>
</tr>
<tr>
<td>3.17</td>
<td>Indiveri et al.’s time-to-crash sensor</td>
</tr>
<tr>
<td>3.18</td>
<td>Indiveri et al.’s direction-of-heading detector</td>
</tr>
<tr>
<td>3.19</td>
<td>McQuirk’s CCD focus of expansion estimation chip</td>
</tr>
<tr>
<td>3.20</td>
<td>Gruss et al.’s range finder</td>
</tr>
<tr>
<td>3.21</td>
<td>Sarpehkar et al.’s pulse mode motion detector</td>
</tr>
<tr>
<td>3.22</td>
<td>Meitzler et al.’s 2D position and motion detection chip</td>
</tr>
<tr>
<td>3.23</td>
<td>Aizawa et al.’s Image Sensor with Compression</td>
</tr>
<tr>
<td>3.24</td>
<td>Hamamoto et al.’s Image Sensor With Motion Adaptive Storage Time</td>
</tr>
<tr>
<td>3.25</td>
<td>Simoni et al.’s Optical Sensor and Analog Memory Chip with Change Detection</td>
</tr>
<tr>
<td>3.26</td>
<td>Espejo et al.’s Smart Pixel CNN</td>
</tr>
<tr>
<td>3.27</td>
<td>Moini et al.’s Shunting Inhibition Vision Chip</td>
</tr>
<tr>
<td>3.28</td>
<td>Etienne-Cummings et al.’s Motion Detector Chip</td>
</tr>
<tr>
<td>3.29</td>
<td>CSEM’s Motion Detector Chip for Pointing Devices</td>
</tr>
</tbody>
</table>

4 Chips for Vision

<p>| | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>4.1</td>
<td>Introduction</td>
</tr>
<tr>
<td>4.2</td>
<td>Hakkarainen &amp; Lee’s AVD CCD Chip for Stereo Vision</td>
</tr>
<tr>
<td>4.3</td>
<td>Ertas’s CMOS Chip for Stereo Correspondence</td>
</tr>
</tbody>
</table>

5 Optical Neuro Chips

<p>| | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>5.1</td>
<td>Mitsubishi Electric’s Optical neurochip and retina</td>
</tr>
<tr>
<td>5.2</td>
<td>Yu et al.’s optical neurochip</td>
</tr>
</tbody>
</table>

6 Active Pixel Sensors

<p>| | |</p>
<table>
<thead>
<tr>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>6.1</td>
<td>Introduction</td>
</tr>
<tr>
<td>6.2</td>
<td>JPL’s active pixel sensors</td>
</tr>
<tr>
<td>6.3</td>
<td>Fowler et al.’s pixel level ADC sensor</td>
</tr>
<tr>
<td>6.4</td>
<td>Technion’s Adaptive Sensitivity CCD Imager</td>
</tr>
<tr>
<td>6.5</td>
<td>Technion’s TDI CCD sensor</td>
</tr>
</tbody>
</table>
7 Principles & Building Blocks

7.1 Introduction

7.2 Phototransduction, the Doorway to Vision Chips
  7.2.1 Photodetector Elements
  7.2.2 Quantum Efficiency of a Vertical Junction Diode
  7.2.3 Quantum Efficiency of a Lateral Junction Diode
  7.2.4 Quantum Efficiency of a Vertical Bipolar transistor
  7.2.5 Quantum Efficiency of a Lateral Bipolar Photodetector
  7.2.6 Mixed structures
  7.2.7 Quantum Efficiency of a Photogate
  7.2.8 The Effect of Scaling on Photodetecting Elements
  7.2.9 Mismatch in Photodetecting Elements

7.3 Photocircuits
  7.3.1 Logarithmic Sensor Using MOS Diodes
  7.3.2 Photocircuit with Buffer-like Pull-up
  7.3.3 Photocircuit with Amplifier-like Pull-up
  7.3.4 Buffered Logarithmic Photocircuit
  7.3.5 Delbrück’s Adaptive Photocircuit
  7.3.6 Cascode Photocircuits
  7.3.7 Current Amplifier Photocircuit
  7.3.8 Integration Based Photocircuits

7.4 Circuits and techniques for active pixel sensors
  7.4.1 Photocircuits in active pixel sensors
  7.4.2 Correlated double sampling

7.5 Spatial Processing
  7.5.1 Linear Resistive networks
  7.5.2 Smoothing networks
  7.5.3 Nonlinear Resistive networks
  7.5.4 Resistive Circuits
  7.5.5 CCD Circuits for Spatial Processing

7.6 Spatio-Temporal Processing
  7.6.1 Analog Memory Elements
  7.6.2 Continuous Delay Elements

7.7 Adaptation
  7.7.1 Light Adaptive Photodetectors
  7.7.2 Light Adaptive Photocircuits
  7.7.3 Light Adaptive Architectures
  7.7.4 Spatial Adaptation Models

7.8 Practical issues in designing vision chips
  7.8.1 Mismatch
  7.8.2 Digital noise

7.9 Testing vision chips
  7.9.1 Design for Testing
  7.9.2 Tests and Measurements
  7.9.3 Test conditions
  7.9.4 Steady-state tests
  7.9.5 Spatio-temporal tests

A Other resources
B About this report  
References  

References

168
Chapter 1

Introduction

Smart vision systems will be an inevitable component of future intelligent systems. Conventional vision systems, based on the system level integration (or even chip level integration) of an imager (usually a CCD) camera and a digital processor, do not have the potential for application in general purpose consumer electronic products. This is simply due to the cost, size, and complexity of these systems. Because of these factors conventional vision systems have mainly been limited to specific industrial and military applications. Vision chips, which include both the photosensors and parallel processing elements (analog or digital), have been under research for more than a decade and illustrate promising capabilities.

1.1 Smart sensors

The integration of photodetecting elements and processing circuits on the same chip, for obtaining better performance from sensors, or for making the sensing and processing system more compact, is not a new idea, but the concept of smart sensing, i.e. sensor information processing without redundant and unnecessary data acquisition, and with at-sensor-level processing is relatively new. The word “smart-sensors” sometimes has been applied to those sensors which have only tried to integrate the sensors and processing modules, without any regard to the low level interaction that can exist between the sensors and processors. With this meaning in mind, even large systems with a vidicon and a 100kg main frame computer could be called a smart-sensor. The only difference would be the size. Here I would like to use a more fundamental meaning for smart-sensors. “Smart-sensors are those devices in which the sensors and circuits co-exist, and their relationship with each other and with higher-level processing layers goes beyond the meaning of transduction. Smart-sensors are information sensors, not transducers and signal processing elements. Smart sensors are not general purpose devices. Everything in a smart sensor is specifically designed for the application targeted for.” With this meaning in mind we exclude any camera-processor combination, even if they are integrated on the same chip. However, sensors such as NSIP architecture described in [Aström 93, Forchheimer and Aström 94] and column parallel architecture in [Hamamoto et al. 96a, Hamamoto et al. 96c], although do not integrate the sensors and processors at the pixel level, still possess a tight relationship between the sensors and processors. In fact these architectures suggest that some of the drawbacks of vision chips, such as loss of resolution and fill-factor, may be relieved, while maintaining a semi-parallel processing (in one dimension).
Traditional photodetectors could only output an analog signal, which required further signal conditioning. Still in most imagers the main focus is on the quality of the imaging in terms of noise, resolution, speed, and so on. It is assumed that further signal and image processing stages can acquire the imager output and process it. In contrast, in vision chips the main focus is on the quality of processing. The implementation of a certain algorithm using existing components is given the priority and often some image characteristics, such as resolution, are sacrificed.

1.2 Advantages and disadvantages of vision chips

When compared to a vision processing system consisting of a camera and a digital processor, a vision chip provides many system level advantages. These include

- **Speed**: The processing speed achievable using vision chips exceeds that of the camera-processor combination. A main reason is the information transfer bottleneck between the imager and the processor. In vision chips information between various levels of processing is processed and transferred in parallel.

- **Large dynamic range**: Many vision chips use photodetectors and photocircuits which have a large dynamic range over at least 7 decades of light intensity. Many also have local and global adaptation capabilities which further enhances their dynamic range. Conventional cameras are at best able to perform global automatic gain control.

- **Size**: Using single chip implementation of vision processing algorithms, very compact systems can be realized. The only parts of the system that may not be scalable are the mechanical parts (like the optical interface).

- **Power dissipation**: Vision chips often use analog circuits which operate in sub-threshold region. There is also no energy spent for transferring information from one level of processing to another level.

- **System integration**: Vision chips may comprise most modules, such as image acquisition, and low level and high level analog/digital image processing, necessary for designing a vision system. From a system design perspective this is a great advantage over camera-processor option.

Although designing single-chip vision systems is an attractive idea, it faces several limitations:

- **Reliability of processing**: Vision chips are designed based on the concept that analog VLSI systems with low precision are sufficient for implementing many low level vision algorithms. The precision in analog VLSI systems is affected by many factors, which are not usually controllable. As a result, if the algorithm does not account for these inaccuracies, the processing reliability may be severely affected. Vision chips also use unconventional analog circuits which may not be well characterized and understood.
• **Resolution:** In vision chips each pixel includes a photocircuit\(^1\) which occupies a large proportion of the pixel area. Therefore, vision chips have a low fill-factor and a low resolution. The largest vision chip reported has only \(210 \times 230\) pixels, for a photocircuit consisting of six transistors only [Andreou and Boahen 94a].

• **Difficulty of the design:** Vision chips implement a specific algorithm in a limited silicon area. Therefore, often off-the-shelf circuits cannot be used in the implementation. This involves designing many new analog circuits. Vision chips are always full custom designed, and full custom design is known to be time consuming and error-prone.

• **Programming:** None of the vision chips are general purpose. In other words, many vision chips are not programmable to perform different vision tasks. This inflexibility is particularly undesirable during the development of a vision system.

### 1.3 Challenges

Vision chip design is a challenging task. One should consider issues from visual processing algorithms to low level circuit design problems, from phototransduction principles to high-level VLSI architectural issues, from mismatch and digital noise to "readout" techniques, from optics to electronics and optoelectronics, from pure analog to mixed analog/digital to pure digital design problems, from biologically inspired vision models to intuitive models to computational models, ....

A vision chip requires photodetecting elements, image acquisition circuits, analog conditioning and processing circuits, digital processing and interfacing, and image readout circuitry all on the same chip. Many of these components, such as low level analog processing elements, should exist in number the same as photodetectors. In most cases these components should interact with at least their nearest neighbors. The area required for implementing the circuits and routing the information across the chip has put upper bounds on the realization of reliably functional and high resolution vision chips, and in most implementations resolution or functionality has been sacrificed for the other. The design of vision chips can obviously benefit from the high level integration in current VLSI processes, where more than 10 million transistors can be integrated on a single chip. Unfortunately, advanced processes for high level integration are usually tuned and characterized for leading edge digital processors and DRAMs, suffering from sub-micron effects, such as short channel effects, hot-carrier effects, band-to-band tunneling, gate-oxide direct tunneling, gate induced drain leakage (GIDL), drain induced barrier lowering (DIBL), and threshold voltage control [Fienga et al. 94]. Many available processes, on the other hand, do not have any specific photodetecting element, and are not well tuned for analog circuit design. Device mismatch has also severely affected the analog circuit design community, and almost no fabrication processes have been carefully characterized and modeled to account for mismatch.

Design of vision chips has also been affected by the lack of VLSI friendly computer vision algorithms. Most current computational computer vision algorithms are very complex and are even hardly implementable using powerful workstations to run in real time.

---

\(^1\)I have adopted the term "photocircuit" because of its clear and sharp reference to a circuitry which processes the photocurrent or photovoltage. Other terms, such as "photoreceptor", have been interchangeably used both for single photodetectors and the circuitry used for processing photocurrents, and in a context full of these references become confusing.
Many computer vision algorithms are still not reliable enough for application in general uncontrolled environments. Biologically inspired algorithms, on the other hand, rely on the fact that many creatures have developed very efficient visual system. These algorithms, however, are not mature enough and suffer from excessive simplification caused by insufficient understanding of animals visual system.

Despite these facts, the design of single chip VLSI vision sensors, or smart vision sensors is increasingly progressing and many vision chips based on biological or computational algorithms have been developed in the past few years. The complexity of vision chips has also significantly increased, and 2D vision chips with more than 48,000 detectors and processing elements have been designed [Andreou and Boahen 94b].

1.4 Technology

Different technologies offer advantages and disadvantages for the design of vision chips. The dominant technologies available to date are CMOS, BiCMOS, CCD, and GaAs (MESFET and HEMT). CMOS has been exhaustively used in many designs. The additional bipolar transistor in BiCMOS processes, though advantageous in achieving better matching properties and higher speeds, is not easily justifiable when comparing other factors in the design. While commercial grade CMOS processes are accessible through fabrication brokers, such as MOSIS and CMP, the CCD processes available for prototyping are of a low quality. GaAs processes have been used only to a very limited extent because there are no readily available photodetector structures in such processes, and more importantly, analog circuit design is severely limited by gate leakage in MESFET and HEMT transistors. In the following sections advantages and disadvantages of each process are highlighted.

1.4.1 CMOS

CMOS has been and will remain the dominant technology is almost all VLSI design areas, including vision chips. This is a direct result of the following advantages offered by CMOS processes.

- **Mature technology**: CMOS processes are well established and continue to become more mature. The powerful trust by leading edge digital memory and processors has led to continuous improvement and down scaling of CMOS processes.

- **Design resources**: Circuit and system design in CMOS is supported by a vast number resources. Many design techniques and design libraries for analog and digital design are available.

- **Availability**: CMOS processes are now readily available for prototype designs through fabrication brokers, at low prices. This has boosted the design knowledge by real implementations, rather than pure theoretical treatments.

- **Price**: CMOS is the cheapest process available, when compared against other technologies with the same minimum feature size.

The major disadvantages of CMOS technology for implementing vision chips are:

- **Analog circuit design**: Leading edge processes are not characterized and tuned for analog circuit design.
• **Photodetectors:** The photodetector structures are not characterized in any of the processes. It is the designer’s responsibility to assure that the photodetectors function as desired.

• **Second order effects:** In the scaling process some second order device characteristics, such as subthreshold operation, are usually ignored or paid less attention, and their cancellation is more desired than their improvement.

• **Mismatch:** Mismatch in CMOS devices is relatively high. This is specially hindering the reliability of analog processing in vision chips.

### 1.4.2 BiCMOS

BiCMOS processes provide an additional bipolar device, which has been the workhorse of analog design. The bipolar transistor can be used to increase the speed, reduce the mismatch, and obtain better circuit characteristics when exponential I-V relationship is required. However, the use of BiCMOS processes has been limited due to its complexity and cost. Also the large area required for each bipolar transistor makes them unattractive for large vision chips.

### 1.4.3 CCD and CMOS/CCD

CCD processes have originally been developed for analog signal processing and imaging devices. Although this may have facilitated the design of vision chips, due to their drawbacks there has been limited success in achieving functional and reliable vision chips. Major drawbacks of CCD and CCD/CMOS with respect to CMOS are:

• **Clocking:** To perform even simple operations large number of clock phases are required, these clock phases should be distributed to all cells

• **Process optimization:** Special CCD processes do not have optimized CMOS devices and CCD/CMOS processes do not have optimized CCD structures

• **Special read and write circuit:** For transferring signals between CMOS and CCD parts in a CCD/CMOS circuit special read/write circuits are necessary

• **Large area:** Occupying large area per cell due to the above items

• **Digital noise:** Massive clock-induced-noise to analog circuits in mixed CCD/CMOS approach

• **Power:** Power consumption due to large voltage transients required for clocking the gates of CCD structures (large capacitive loads)

Despite these numerous drawbacks, CCDs offer easier solutions for some operations. For example, in a smoothing CCD vision chip the smoothing width can be increased by only leaving the circuit to operate over more clock cycles. In other words, CCDs are capable of iterating a function without demands on additional space.
1.4.4 GaAs MESFET and HEMT

GaAs processes are recognized by their high speed operation for digital and analog circuits. They have also been used in opto-electronic devices. GaAs processes suffer from several problems:

- **Maturity of technology**: The processes are not mature. It was only recently that GaAs processes could achieve high integration (in the order one million transistors).

- Analog design: Analog circuit design has been affected by the schottky diode at the gate. This diode is in a forward bias direction. For an enhancement mode MESFET this leaves a gate-source voltage range of 0.1 V to 0.6 V.

- Price and availability: GaAs processes are generally expensive and not easily accessible.

- Opto-electronic devices are only available in very specialized processes.

1.5 Major groups working on vision chips

- The works of Carver Mead’s group in California Institute of Technology, starting with Lyon’s optical mouse designed in 1980 (see section 3.2), are major contributions to this exotic and fascinating area of VLSI design. The idea of neuromorphic engineering using VLSI technologies was first introduced by Carver Mead and bloomed into several analog VLSI chips appearing in “the Bible of analog VLSI”, *Analog VLSI and Neural Systems* published by Addison-Wesley in 1989 [Mead 89b]. This work is still continuing in the *Carver-Lab* in Caltech. The research emphasis in this group is on analog VLSI systems. In the past they have designed many chips using analog VLSI based on biological models of vision, cochlea, and other neural systems.

- *Koch-Lab* again in Caltech, led by Christoph Koch, has focused on modeling biological neural systems and also implementing them in analog VLSI. Research in the laboratory of Professor Christof Koch focuses on several areas:
  - Biophysics of Computation in Single Neurons
  - Cortical Circuits Underlying Motion and Visual Attention
  - Psychophysics of Attention and Awareness
  - The Neuronal Correlate of Visual Awareness and Consciousness
  - Neuromorphic Analog VLSI Vision Systems

- Analog VLSI group in Johns Hopkins University led by Professor Andreas Andreou has had similar interests in analog VLSI systems as the Carver-Lab. Analog VLSI chips mainly based on biological models have been designed in this lab. Some system designs in this lab include analog VLSI models of auditory processing, early vision and silicon retinas, associative memory, adaptive neural networks, and speech recognition.

- The VLSI group at Laval University is led by Marc Tremblay. The research is principally inspired by computational needs in computer vision. The VLSI projects are focused on the design and development of smart sensors. Some of the research...
topics include the MAR-Camera systems, motion detection, and linear arrays for 3-D cameras.

- The Image Processing Group at Linkoping University in Sweden has been developing vision cameras with processing capabilities. The group led by Andres Astrom has developed the commercially available LAPP and MAPP series cameras.

- Adelaide Uni. in Australia has been pursuing the insect vision based motion detector project since 1991. Inspired by the simplicity of the insect visual system, and using a VLSI friendly model for insect vision, the first insect vision chip was designed in 1992. The project led by Abdesselam Bouzerdoum and Kamran Eshraghian is having a rapid growth in number of people and aspects of the design. The work is being funded by strong industrial partners and the federal government of Australia.

- IMEC and IBIDEM consortium, involving several universities in Spain and Italy, have focused on the design of space variant sensors, more specifically the "foveated sensors". The log-polar mapping performed by these sensors is very attractive for applications requiring rotation and scale invariant processing.

- A research group in MIT has concentrated on the implementation of early visual processing using CCD and CMOS technologies. In this set of projects they have targeted vision tasks and algorithms requiring low precision. The reason in selecting CCD as the base technology for the implementation has been stated to be the achievable compact size.

Some of the chips designed as a part of this project include

- CMOS image moment and orientation chip [Standley 91b]
- CCD/CMOS image smoothing and segmentation chip [Keast and Sodini 92]
- CCD image feature extraction chip [Chiang et al. 90, Chiang and Chuang 91]
- CCD/CMOS focus of expansion chip
- CCD multiscale veto motion sensor [Dron 94]
- CCD/CMOS stereo chip

- The VLSI Systems group in Southern Illinois University at Carbondale is working on VLSI design of vision chips for real-time dynamic tasks encountered in manufacturing and assembly, auto-navigation, and un-manned vehicles and robotics.

- The Hatori-Aizawa Lab. in Tokyo University, has focused on compression sensors and adaptive imagers for on-chip compression and adaptation.

- The VLSI group in Technion, Israel, headed by Ran Ginosar has been developing adaptive sensitivity smart imagers, and techniques for improving the scanning of imagers.

- A group in Mitsubishi Electric has been developing optical neurochips using exotic III-V compound structures. The main focus of the research has been on optical interconnection and neural network architectures.

- The “SYNERGY” Lab. in Arizona State University led by Lex Akers has concentrated on designing vision chips and adaptive smart sensors “camera on chip”.

---

The Univ. of Adelaide

Introduction
• A group at EPFL, in Switzerland and led by Erric Vittoz are working on analog VLSI systems. They have designed several vision chips. Due to agreements with industrial partners the works remain unreported to public.

• The VLSI group at the University of Sevilla in Spain which is associated with “National Centre of Microelectronics”, has been developing vision chips based on cellular neural networks (CNN).

• The Analog Computation Group in University of Florida is headed by John Harris. The work in this group includes analog VLSI circuits for sensory processing, neural networks, and neurobiological models.

1.6 How vision chips are presented in this report

The revolutionary ideas implemented on silicon chips, which will be described in the following sections, are blooming and opening the doors for another information processing tool, smart-visual-sensing. This report is a survey of the vision chips that have been designed in the past decade. It tries to give a concise and simple description of each design and draw the reader’s attention to the specific idea that a particular vision chip has brought about. For each chip a brief description about the function of the chip is given. Important architectural and circuit level design aspects are also presented. Basic principles of vision algorithms and circuit design will not be described, at least in detail. The reader is expected to have a good knowledge of image processing, and general knowledge on analog and digital VLSI design. Wherever available, some information about the fabricated chips, like cell size, chip size, and process are also provided. In future revisions more information about the performance of each chip will be presented. Of course, as the computer vision community is still faced with the lack of proper benchmarks and criteria for image processing performance at different levels, most of the performance measures will be those related to VLSI, such as speed, power, and contrast sensitivity, rather than computer vision related measures.

There are some links (in the html format of the report) to the relevant home pages of the authors of each work to make the vision-chip community as close together as possible from each other’s work. Some of the documents are also linked to the online postscript format of the articles relevant to that chip, which can be downloaded and printed on a postscript printer.

In this third revision of the vision chips document, I have added many new vision chips, and the report now comprises more than fifty vision chips. The document encompasses seven chapters.

In chapter 2 Spatial Image Processing Vision Chips are presented. This includes chips for edge detection, smoothing, stereo processing, and contrast enhancement (silicon retinas), in addition to chips for finding global features of the image.

Chapter 3 covers Spatio-Temporal Image Processing Vision Chips, dominated by motion detection chips. Although motion detection chips intrinsically include spatial processing, the cooperative time-space processing makes them different from other vision chips. There are also a few purely temporal processing implementations.

Chapter 4 presents a few analog VLSI chips for vision processing. These chips do not have on-chip photodetectors, and the image is produced by external imagers. These implementations cannot be regarded as vision chips in any sense. It is rather the processing, and vision related implementations that makes them interesting for this report. They also
represent the type of processing that can be performed – still in analog domain – on the output of any vision chip.

In chapter 5 *Optical Neuro Chips* are described. These chips architecturally belong to the *analog VLSI neuro-chips* family, with the exception that the medium for signal transmission is chosen to be optical. However, as the optical neuro chips designed so far aim at image processing, they are included in this report. The role of neural networks in many new image processing algorithms can also justify this inclusion.

Chapter 6 describes some of the active pixel sensor (APS) chips. Their relevance to this report is from the fact that in these imagers the attention is focused on the quality of the imaging, while in vision chips the implementation of an algorithm is the main concern. Many of the methods developed for enhancing the performance of APSs can be adopted for vision chips.

And finally, chapter 7, *Designing Vision Chips: Principles and Building Blocks*, presents general principles, design limitations, design variables, guidelines to be considered in the design of vision chips, and testing procedure. Also various components of vision chips, i.e. photodetector, photocircuits, spatial and temporal processing circuits are presented in more detail.

Appendices describe some information about this report, such as tools used to generate the document, and give reference to other major resources, on-line or off-line, about vision chips.
Acknowledgments

Creating this report has been a pleasing reward for me. In the first few days after the first public announcement in 1994 I experienced the great encouragement of the VLSI-Vision community. I was given so many fantastic ideas and suggestions in designing this report. I received information about many recent and past works, which helped me make the report more complete than what I had in mind. Since then I have received continuous and unfading support in various ways, by informing me of other vision chips, by correcting my misunderstanding of technical details, and in many other ways.

I would like to thank all those who encouraged me, supported the idea in different ways, and helped me in different ways, in alphabetical order:

- Prof. Kiyoharu Aizawa (Tokyo Uni., Japan)
- Prof. Andreas Andreou (Johns Hopkins Uni., USA)
- Miguel Arias (Laval Uni., Canada)
- Dr. Thierry Bernard (ETCA/CREA/SP, France)
- Dr. Abdesselam Bouzerdoum (Adelaide Uni., Australia)
- Dr. Tobi Delbrück (Synaptics Inc., USA)
- Bart Dierickx (IMEC, Leuven, Belgium)
- Prof. Lisa Dron (Northeastern Uni., USA)
- Dr. Gamze Erten (IC Tech Inc., USA)
- Dr. Ralph Etienne-Cummings (Southern Illinois Uni, USA)
- Prof. Gerhard Fasol (University of Tokyo, Japan)
- Dr. Robert Forchheimer (Linkoping Uni., Sweden)
- Dr. Eiichi Funatsu (Photonic LSI Technology Group, Mitsubishi Electric Corp., Japan)
- Dr. Ran Ginosar (Technion, Israel)
- Dr. Mats Gvikstorp (R&D Manager, Integrated Vision Products AB, Sweden)
- Dr. Takayuki Hamamoto (The Science University of Tokyo, Japan)
- Dr. Giacomo Indiveri (Caltech, USA)
• Prof. Eberhard Lange (Advanced LSI System Technology Department, Mitsubishi Electric Corporation, Japan)
• Dr. John Lazzaro (Berkeley & Caltech, USA)
• Ms. Shih-Chii Liu (Caltech, USA)
• Luc Le Pape (Universite de Bretagne Occidentale, France)
• Dr. Richard Lyon (Apple Computer, USA)
• Dr. Richard Meitzler (Johns Hopkins Uni., USA)
• Dr. Fernando Pardo (Diseño Electrónico y Circuitos VLSI, Universidad de Valencia, Spain)
• Prof. Robert Pinter (Washington Uni., USA)
• Dr. John Platt (Director of Research, Synaptics Inc., USA)
• Dr. Elisenda Roca (Centro Nacional de Microelectronica (CNM), Spain)
• Prof. Giulio Sandini (DIST - University of Genova, Italy)
• Rahul Sarpeahkar (Caltech, USA)
• Dr. Kim Strohbehn (Johns Hopkins Uni., USA)
• Prof. Marek Syrzycki (Simon Fraser Uni., Canada)
• Prof. Marc Tremblay (Laval Uni., Canada)
• Dr. Rudi Wiedemann (President/CEO of Silicon Vision Inc.)
• Robert Wodnicki (McGill Uni., Canada)

Alireza Moini
Chapter 2

Spatial Image Processing Vision Chips

2.1 Introduction

This chapter describes vision chips that implement only a spatial image processing function, from simple local smoothing operations to more complicated and global object orientation detection. Several different categories can be easily recognized among these vision chips.

A majority of spatial image processing chips, which have been dubbed *silicon retinas*, are based on models of the vertebrate retina. Some of the general characteristics of the vertebrate retina, which have been given considerable attention, are the adaptation to local and global light intensity, and edge enhancement. Various models have been proposed for the form and function of the retina, such as Laplacian of Gaussian (LOG), Difference of Gaussian (DOG), a direct derivative of the biharmonic equation, and linear and multiplicative lateral inhibition. Not surprisingly, the form of the kernel convolution function in all of these models has a mexican-hat shape shown in Figure 2.1, though the underlying mathematical or biological theories may be quite different\(^1\). Which one of these models can best approximate the function of the retina is still subject to more experience with these models and the retina itself.

\[ \text{Figure 2.1: The mexican hat. A generic kernel with different explanations and models.} \]

The Gaussian filtering plays an important role in most of the models used in implementing silicon retinas. The smoothing operation performed at any stage, and specially at the front-end, may help in reducing the noise. In some silicon retinas Gaussian filtering is followed by a subtraction or division stage, to enhance the edges and make the image

\(^1\)We should remember that all these models have been obtained under various assumption to regularize the specific problem or to simplify the model
invariant to the local intensity, at a neighborhood determined by the characteristics of the Gaussian filtering. In many silicon retinas a simple 1-D or 2-D resistive network serves as the basic element for approximating the Gaussian smoothing function. Only one implementation utilizes a more accurate approximation to the Gaussian filtering [Kobayashi et al. 95b].

Another group of spatial processing vision chips target more global features of the image, such as the object position and orientation chip [Standley 91b] or the centroid computation chip [Deweerth 92].

Foveated sensors constitute another group of spatial vision chips. In these sensors the physical size and placement of the photodetectors form a log-polar mapping on the image. Log-polar mapping is rotation and scale invariant, with a high resolution in the centre, and logarithmically decreasing resolution off the centre.
2.2 Mahowald and Mead’s silicon retina

Mahowald’s silicon retina chip is among the first vision chips which implemented a biological facet of vision on silicon [Mahowald 94a, Mead 89b]. The computation performed by Mahowald’s silicon retina is based on models of computation in distal layers of the vertebrate retina, which include the cones, the horizontal cells, and the bipolar cells. The cones are the light detectors. The horizontal cells average the outputs of the cones spatially and temporally. Bipolar cells detect the difference between the averaged output of the horizontal cells and the input.

In this silicon retina the cones have been implemented using parasitic phototransistors and MOS-diode logarithmic current to voltage converters. Averaging is performed using a hexagonal network of active resistors as shown in Figure 2.2. The resistors are implemented using the horizontal resistor described in [Mead 89b]. The shape of the smoothing operation performed by the resistive network is similar to the charge distribution in semiconductors, and is an exponential function. The smoothing factor depends on the value of the resistors (or diffusion constant in semiconductors).

This silicon retina is in fact a simple implementation of other silicon retinas, which will be described later and are referenced in [Andreou and Boahen 94b, Bair and Koch 91b]. In those implementations two separate smoothing networks with different smoothing constants are used. The corresponding outputs of the two smoothing networks are then compared using a differentiating function, such as division or subtraction. In Mahowald’s silicon retina only one smoothing network has been implemented. Yet it demonstrates many similarities between the signals obtained from a real retina (of a mud puppy) with those obtained from the silicon retina.

Figure 2.2: Architecture of Mahowald’s silicon retina.
2.3 Mead’s adaptive retina

Mead’s adaptive retina [Mead 89a] is an enhanced implementation of Mahowald’s silicon retina described in Section 2.2. The chip uses floating gate MOSFETs (FGMOS) as a feedback element used for correcting the problem of offset and mismatch between transistors.

Figure 2.3 shows two circuits with and without the FGMOS transistor. The retina chip using the circuit in Figure 2.3-a had demonstrated a very sensitive operation in which the output voltage of many pixels were stuck to Vdd or Gnd supply voltages under uniform illumination. The reason is that a small offset in OTA1 is amplified by the inverting amplifier and the output will saturate to one of the supply rails.

In order to mitigate this problem a feedback loop has been constructed to compensate for the effect of mismatch by changing the effective threshold voltage of transistor M2. This has been realized by the UV activated coupler, which is a simple poly1-poly2 structure. When the poly1-poly2 structure is exposed to the UV light, the feedback loop is closed and the floating gate sits at a voltage which holds the output voltage, at a level which depends on the input current (if the coupler becomes short circuit, the pull up circuit will be the simple two stacked MOS diode).

![Figure 2.3](image.png)

Figure 2.3: a) the circuit without FGMOS, b) the circuit with FGMOS.
2.4 Mahowald and Delbrück’s stereo matching chips

In [Mahowald and Delbrück 89] Mahowald and Delbrück present two stereo matching chips which use static and dynamic features of image. Both chips use Marr/Poggio’s algorithm for stereo matching of two right and left image planes [Marr and Poggio 76], and compute disparities on nine disparity planes. The chip architecture is shown in Figure 2.4, which illustrates only three disparity planes. In the first chip, retina elements are a 1D version of the 2D Mahowald-Mead’s retina described in Section 2.2 [Mahowald 94a]. The outputs of the two right and left retinas are multiplied together using a four-quadrant Gilbert multiplier and provide input for the correlator. In the second chip the retina elements are not connected together and are based on the time-derivative pixel circuit, which is capacitively coupled to a rectifier. The block diagrams of the correlator input circuitry in both chips are shown in Figure 2.5.

The correlation circuitry after node “X” in the correlator box is similar in both chips and is shown in Figure 2.6. The final output of the chips are the voltages at the Output nodes of the correlator circuits.

The chips have 40 pixels in a 2μm CMOS process. Experimental results are provided in [Mahowald and Delbrück 89].

![Figure 2.4: Architecture of Mahowald-Delbrück’s stereo matching chip. Excitation in a disparity plane is done by the resistive elements. The inhibitory connections from neighboring disparity planes are not shown.](image-url)
Figure 2.5: Schematic diagram of correlator input for a) the static input image chip, and b) the dynamic input image chip.

Figure 2.6: Simplified circuit of the correlator.
2.5 Bernard et al.’s Boolean artificial retina

Bernard et al. describe an artificial retina in [Bernard et al. 93a]. The main difference between this retina and other implementations is that the image is digitized at the very first stage, and processing is performed by Boolean operators acting at the pixel level. The main advantage of this digital retina over analog approaches is its programmability for performing different tasks with the same hardware.

Photodetection has been realized using photocurrent integration followed by thresholding. Therefore the signal becomes digital right at the detector level. The rest of the implementation concerns the design of digital Boolean processors in an architecture called neighborhood combinatorial processing (NCP). A partial set of implemented Boolean operations is shift-up, shift-down@left, shift-right, circular permutation and inversion, copy, inverting copy, conjunction, conjunction and inversion, writing the photodiode, and reading the photodiode. These instructions are coded into a pseudo-static digital circuit, which uses six control signals. The architecture of Bernard’s digital retina indicating its pixel level interaction is illustrated in Figure 2.7. Further details about the design of this digital circuit can be found in [Bernard et al. 93a].

By combining simple instructions more complicated operations can be performed. Operations such as edge detection, motion detection, and halftoning have been successfully demonstrated using the chip.

Several versions of the chip have been designed and fabricated. One of them occupies an area of 50 mm² and contains a 65 × 76 array of photodetectors and Boolean processors, using a 2μm CMOS process.

![Figure 2.7: Architecture of Bernard et al.’s digital retina.](image-url)
2.6 Andreou and Boahen’s silicon retina

This silicon retina is an implementation of the outer-plexiform of retinal processing layers [Andreou and Boahen 94b, Boahen and Andreou 92]. The design has a distinctive feature that separates it from all other silicon retinas. The implementation uses a very compact circuit, which has enabled the realization of a 210×230 array of image sensors and processing elements with about 590,000 transistors, which is the largest among all reported vision chips.

This silicon retina uses a diffusive smoothing network shown in Figure 2.8 [Andreou and Boahen 94b]. The function of this one-dimensional network can be written as

\[
\frac{dQ_n}{dt} = D \left( [Q_{n+1} - Q_n] - [Q_n - Q_{n-1}] \right)
\]

\(dQ_n/dt\) is the current supplied by the network to node \(n\), and \(D\) is the diffusion constant of the network, which depends on the transistor parameters, and the voltage \(V_C\).

Andreou and Boahen have encapsulated the model of the retina in a neat and small circuit illustrated in Figure 2.9. This circuit includes two layers of the diffusive network. The upper layer corresponds to horizontal cells in retina and the lower layer to cones. Horizontal N-channel transistors model chemical synapses.

The function of the network can be approximated by the biharmonic equation

\[
gh \nabla^2 \nabla^2 I_h(x, y) + I_h(x, y) = I(x_i, y_i)
\]

\[I_{\text{out}}(x_i, y_i) = \nabla^2 I_h(x_i, y_i)\]

where \(g\) and \(h\) are proportional to the the diffusivity of the upper and lower smoothing layers, respectively. More details about the function of the circuit can be found in relevant references [Andreou and Boahen 94b, Boahen and Andreou 92].

Several versions of the 2D chip have been implemented using the circuit shown in Figure 2.8. All the 2D chips use a hexagonal network with six neighborhood connection. The largest chip occupies an area of 9.5×9.3, in a 1.2μm CMOS process with two layers of metal and poly. A cell size of about 39.6μm × 43.8μm has been achieved for this implementation. Under typical conditions the chip dissipates 50mW.

![Figure 2.8: The diffusive network used in Andreou-Boahen’s silicon retina.](image)
Figure 2.9: Schematic of the 1D silicon retina modeling the outer-plexiform of retinal processing.

2.7 Kobayashi et al.’s image Gaussian filter

The spatial image Gaussian filter designed by Kobayashi et al. [Kobayashi et al. 95b] uses a hexagonal resistive network. It uses negative resistors implemented using negative impedance converters (NIC) to obtain a better approximation to the Gaussian function, than that obtained using simple resistive networks (an exponentially decaying function similar to charge distribution in bulk semiconductors). The 2D resistor connection is shown in Figure 2.10. In order to get the desired Gaussian characteristic, negative resistances are connected to the second-nearest-neighbors, with four times the value of resistors connecting first-nearest-neighbors. The value “4” has been derived from the discretization of the error energy function, $E$, for optimizing the fitting function $U$ added with a penalty term, described by

$$E = \sum_i (U_i - V_i)^2 + \lambda \int \left( \frac{d^2 U}{dx^2} \right)^2 dx$$

where the first term is the mean square error between the fitting function $U$ and the input $V$, and the second term is the penalty term. By discretizing this equation and finding its minimum, a relation between the fitting function and the input can be found.

$$0 = (U_i - V_i) + \lambda(6U_i - 4(U_{i-1} + U_{i+1}) + (U_{i-2} + U_{i+2}))$$

The circuit in Figure 2.11 has been used in the implementation of the NIC elements.

The chip has a 45×40 array of photodetectors and resistive grid on a 7.9×9.2mm chip using a 2μm CMOS process.
Figure 2.10: Kobayashi et al.’s resistive network using negative impedance converters for implementing a Gaussian filter.

\[ I_1 = -(V_1 - V_2)/R_2 \]
\[ I_2 = -(V_2 - V_1)/R_2 \]

Figure 2.11: Negative impedance converter (NIC) used in Kobayashi et al.’s image Gaussian filter. a) obtaining a negative impedance using a resistor and two NIC circuits. b) the usage of the NIC circuit in the chip. One NIC has been used for six resistors.
2.8 PASIC sensor from Linköping University

The “Processor ADC and Sensor Integrated Circuit” (PASIC) as the name suggests consists of a sensor array, A/D converters, and processors [Chen et al. 90b, Chen et al. 90a, Chen et al. 90c]. Each column has its own ADC and processor. The architecture of PASIC is shown in Figure 2.12.

A/D conversion is performed in parallel for each selected row. The counter starts from zero and counts up. Whenever the voltage from DAC reaches output voltage of a cell, the counter value is stored in the associated register.

The processing elements consist of three parts, one bi-directional parallel shift register, one ALU, and a memory. These modules communicate to each other through a 1-bit bus. Various operation between these modules occur on single bits at a time. Therefore each instruction requires several clock cycles to complete.

Using this bit-serial processor approach several simple image processing operations, such as binary image dilation and erosion, and more complicated operations, such as convolution and histogram collection, have been implemented.

The sensor array in PASIC has 128×128 photodetectors. The chip occupies an area of about 9mm×11mm.
2.9 MAPP2200 sensor from IVP

The Matrix Array Picture Processor (MAPP) sensor array, which has a very similar architecture to PASIC sensor described in section 2.8, consists of a 2D sensor array and a SIMD processor array [Fochheimer et al. 92, Aström 93]. The architecture of MAPP has borrowed many of the concepts of the PASIC sensor, and has improved some of the logic in the ALU by dividing it into three units: a global logic unit (GLU) for marking specific processing elements, a neighborhood logic unit (NLU) for performing operations such as left and right edge detection, and a point logic unit (PLU) for performing general arithmetic and logical operations.

MAPP2200 has been commercialized by Integrated Vision Products AB (IVP) since 1991. A system consisting of MAPP2200 camera and assembler is available from IVP. The MAPP2200 has 256×256 sensors. In a 1.6μm CMOS process the chip occupies an area of 10mm×15mm.

2.10 Fochheimer-Åström’s NSIP sensor

The overall architecture of Near Sensor Image Processing (NSIP) sensor is very similar to the PASIC (described in section 2.8). However, it embeds an interesting function for each pixel, which perform both an A-to-D conversion, and a 1/x compression [Fochheimer and Aström 94, Fochheimer and Aström 92, Aström et al. 96, Aström 93]. The schematic diagram of a pixel is shown in Figure 2.13. The photodetector works in integration mode. By applying the Reset signal the voltage at the input node \( V_{in} \) is precharged to \( V_{reset} \). By turning off the resetting transistor, \( V_{in} \) charges up and when it reaches the reference voltage \( V_{ref} \) the output voltage becomes high. The time that it takes from the onset of the charging until the output voltage becomes high is related to the input light intensity (input photocurrent) by:

\[
t = \frac{C(V_{reset} - V_{ref})}{I_{photo}}
\]  

Therefore, if the output of the detectors are sampled at some intervals, after resetting the sensor array, the intensity at each detector can be derived from the sample number at which the output of the detector has become high.

For imaging applications the readout mechanism would be complicated, as all the detectors should be read at small sampling periods and their status should be registered in a memory. However, for some image processing tasks, for example finding the position of the maximum intensity, or detecting positive or negative gradients, this method of reading the output at small periods and performing micro-instructions on the outputs, proves more economic than the traditional methods.
2.11 Sandini et al.’s foveated CCD chip

This foveated sensor has been designed by several groups from the University of Genoa, Italy, University of Pennsylvania-USA, Scuola Superiore S Anna of Pisa, and has been fabricated by IMEC in Leuven, Belgium [van der Spiegel et al. 89, Pardo and Martinuzzi 94]. It features a unique concept in the VLSI implementation of a vision chip. The foveated chip, which uses a CCD process, mimics the physically foveated (versus optically foveated, as in some birds) retina of human. Foveated vision is known to reduce the amount of information passed to subsequent processing layers significantly and therefore lends itself to image processing and pattern recognition tasks which are currently performed using uniformly spaced imagers. The foveated vision, however, has evolved concurrently with the eye motor system, where fovea focuses on areas of interest. This can be best utilized for robotic applications in which the low resolution periphery of the fovea finds areas of interest, and then directs the foveated part to get the details of those areas.

The chip has a foveated rectangular region in the middle with high resolution and a circular outer layer with decreasing resolution. The chip floor plan in Figure 2.14 shows different regions of the chip. In the circular region the chip implements a log-polar mapping of the Cartesian coordinates. This mapping provides a scale and rotation invariant transformation.

The chip has been fabricated using a triple-poly buried channel CCD process provided by IMEC. The rectangular inner region has 102 photodetectors. There are 30 eccentric circular layers in the peripheral part, each having 64 photodetectors. A part of the circles has been sliced to allow the interconnection of clock and control signals. The chip area is 11mm × 11mm. In references [van der Spiegel et al. 89, Pardo and Martinuzzi 94] other aspects of the design, such as read-out structures, clock generation, simple theories about the fovea, and hardware interface to the chip are described.

Other features of this chip are:

- 8 mm diameter.
- 76 circles of 128 pixels max.
- 56 circles in "retina", with 128 pixels/circle
- 20 circles in "fovea" with less pixels:
  - 1 x 1 pixel
  - 1 x 4
  - 1 x 8
  - 2 x 16
  - 5 x 32
  - 10 x 64
- pixels have a continuous operation in time (non-integrating!)
- logarithmic intensity to voltage conversion.

Figure 2.14: Photograph of the foveated CCD retina.
2.12 IMEC-IBIDEM’s foveated CMOS chip

The foveated CMOS chip designed by the IMEC and IBIDEM consortium [Ferrari et al. 95b, Ferrari et al. 95a, Pardo 94], and dubbed “FUGA”, is similar to the CCD fovea described in Section 2.11 [van der Spiegel et al. 89]. The rectangularly spaced foveated region in the CCD retina has been replaced by reconfiguring the spatial placement of the photodetectors. As a result of this redesign, the discontinuity between fovea and the peripheral region has been removed. In the CCD retina a blind sliced region (for routing the clock and control signals) exists. In the FUGA retina the need to this region has been removed by routing the signals through radial channels. Figure 2.15 shows the photograph of the central region of the foveated retina. Several versions of the FUGA chip with different sizes have been designed and manufactured by IMEC.

![Central region of FUGA18 foveated CMOS retina.](image-url)
2.13 Wodnicki et al.'s foveated CMOS sensor

Wodnicki et al. have designed and fabricated a foveated CMOS sensor [Wodnicki et al. 95], which has a high resolution central region and a peripheral region with decreasing resolution. In the central region photodetectors are uniformly spaced in a rectangle and in the periphery are placed in a circular array (See Figure 2.16). Photodetectors have been realized using circular parasitic well diodes operating in integrating mode. The area of photodetectors in the circular outer region increases exponentially, resulting in the log-polar mapping, which is known to be both scale and rotation invariant.

The chip has been fabricated in a 1.2μm CMOS process. It has 16 circular layers in the periphery. The chip size is 4.8mm×4.8mm. It uses a 3.3 V supply voltage and dissipates about 10 mW.

![Simplified structure of Wodnicki et al.'s foveated CMOS sensor and the test system.](image)

Figure 2.16: Simplified structure of Wodnicki et al.’s foveated CMOS sensor and the test system.
2.14 Standley’s orientation detection chip

This vision chip detects the position and orientation of an object [Standley 91b]. The chip first computes moments of the image using a resistive grid. These moments are then used to find orientation and position of an object in the image.

The zeroth and first order moments of an object are defined by

\[ M_0 = \int_0^{x_{max}} \int_0^{y_{max}} m(x, y) dx dy \]
\[ M_{1x} = \int_0^{x_{max}} \int_0^{y_{max}} x m(x, y) dx dy \]
\[ M_{1y} = \int_0^{x_{max}} \int_0^{y_{max}} y m(x, y) dx dy \]

The theory and algebra from which it is postulated that using a resistive network (whose inputs are currents injected into the network, and whose outputs are the currents flowing to the periphery of the grid in four sides of the array) these moments can be computed, can be found in Standley’s thesis [Standley 91a]. In this process the dimension of data is reduced by one order (from 2D to four 1D). The process of data reduction is in fact done twice. Once from within the 2D array to the 1D boundaries, and then from the 1D boundaries to four corners. The chip architecture is shown in Figure 2.17. The input to the 30×30 resistive grid array is provided by a 29×29 array of photodetectors. The resistive grid is implemented using passive polysilicon resistors. Photodetectors are parasitic bipolar transistors. The photo-generated currents are thresholded to eliminate the slow response time of dark pixels. The boundary of the 2D array is connected to a virtual ground. The current flowing into the boundary of the 2D array is sensed and buffered by a 1D array of current sense and buffer circuitry. The buffered current is then switched into one of two 1D resistive grids, one with uniform resistors and the other with quadratic resistors, which are linearly graded with respect to their position from origin (lower-left corner of the chip). The ends of the 1D resistive grids are finally connected to virtual grounds, where the currents can be measured. From the measured currents, the first moment of the image, for example, can be obtained using the following equations.

\[ \bar{x} = x_{max} \left( \frac{i_2 + i_3}{i_1 + i_2 + i_3 + i_4} \right) \]
\[ \bar{y} = y_{max} \left( \frac{i_1 + i_2}{i_1 + i_2 + i_3 + i_4} \right) \]

The chip has been fabricated in a 2μm CMOS process in an area of 7.9mm×9.2mm, and contains an array of 29×29 cells occupying a total area of 5500μm×5500μm.
Figure 2.17: Architecture of Standley’s vision chip.
2.15 Harris et al.’s Resistive Fuse Vision Chip

In [Harris et al. 90, Harris 91, Harris et al. 89] the concept of resistive fuses and a vision chip based on the resistive fuse idea are described. Resistive fuses are two-port nonlinear elements in which the I-V relationship is linear for small values of the voltage across the element, and the current falls as the voltage increases. If this element is used in resistive smoothing networks instead of the linear resistors, for small differences between the inputs, the network performs a smoothing operation. But for abrupt spatial changes and large differences, the resistor is virtually turned off. Believing that large differences only occur at places where discontinuities exist, the resistive fuse network is capable of segmenting regions separated by abrupt intensity changes, and smooths other regions with less variation. It should be noticed that due to the introduction of a nonlinear element with a negative I-V region, the network can have several local minima and may not converge to the global solution. A simple treatment is to change the biasing of the fuse in such a way to start from the original linear resistor and gradually vary the bias to end up in the global minimum.

The circuit and I-V characteristics of the fuse are shown in Figure 2.18. The slope of the I-V curve in the linear region, and the threshold value at which the curve starts falling down are determined by the bias voltages $V_A$, $V_B$, and $V_R$. It can be easily seen that the upper portion of the circuit is the same as the horizontal resistor described in [Mead 89b].

A similar concept has been followed in [Yu et al. 92]. A new circuit introduced in [Yu et al. 92] is shown in Figure 2.19. Compared with the previous circuit it uses fewer transistors, but the linear range of the resistor is small and uncontrollable which makes it unattractive. The transition from linear to cut-off region is also sharper in this circuit.

Harris et al. have designed and fabricated a $20 \times 20$ array of this fuse network and illustrate promising results from some simple tests.
Figure 2.18: a) Harris et al.’s resistive fuse circuit. b) the I-V characteristics of the fuse.
Figure 2.19: Yu et al.’s resistive fuse circuit.
2.16 DeWeerth’s Localization and Centroid Computation Chip

DeWeerth has implemented a centroid detection chip [Dewerth 92] based on an aggregation network shown in Figure 2.20. Using this circuit an output current of the form

\[ I_{O1} - I_{O2} = \sum_{n} i_n \tanh \left( \frac{\Delta V_n}{2U_t} \right) \]

is obtained. In order to use this circuit for spatial centroid detection, a spatially-sweeping reference voltage is produced at one input of the differential pairs. This is simply done by a resistive voltage divider with its ends connected to reference voltages, \( V_0 \) and \( V_N \). The other input of the differential pairs are all connected together and connected to the output of the circuit. Photocurrents are presented as the biasing current of the differential pairs. In the actual implementation described in [Dewerth 92] the input transistors of the differential pairs are realized using bipolar transistors to reduce the effects of device mismatch. Polysilicon resistors have been used for the voltage divider.

Obviously, the nonlinearity of the \( \tanh \) function affects the operation, if proper assumptions or constraints are not made. It is reasonably assumed that the voltage difference across each resistor, \( (V_N - V_0)/N \), is very small. This can easily be satisfied by a choice of reference voltages. An analysis for a simple case of constant background illumination and constant-width-and-intensity object is given in [Dewerth 92].

The 160×160 array of this centroid detection chip has been realized in a 2\( \mu \)m BiCMOS process in an area of 6.8mm×6.9mm.

\[ I_{O1} \quad I_{O2} \]

\[ \Delta V_1 \quad \Delta V_2 \quad \Delta V_N \]

**Figure 2.20:** DeWeerth’s spatial aggregation circuit.
Figure 2.21: DeWeerth’s spatial centroid detection circuit.
2.17 Ward & Syrzycki’s Receptive Field Sensors

In [Ward and Syrzycki 95, Ward and Syrzycki 93, Ward et al. 93] Ward & Syrzycki describe vision sensors based on the receptive field concept. Receptive fields are regions over which several neighboring photoreceptors provide input to processing units. Therefore, a receptive field consists of several photodetectors, a processing unit, and an output unit. There may or may not be overlap between contiguous receptive fields. A sensor with non-overlapping receptive fields is shown in Figure 2.22. In this figure each nine receptors constitute a receptive field.

Based on this concept a Sobel edge detector has been implemented in a chip, where the operators in $x$ and $y$ direction are given by

$$S_x = \begin{bmatrix} -1 & 2 & -1 \end{bmatrix} \quad S_y = \begin{bmatrix} -1 \\ 2 \\ -1 \end{bmatrix} \quad (2.2)$$

The weighting required for the Sobel operators is done using current mirrors. In the implemented chip each edge operator collects input from its eight nearest neighbors.

The reference [Ward and Syrzycki 93] also provides information on a multi-sensitivity photodetector. This detector is a parasitic vertical bipolar between diffusion, well, and substrate. By connecting a MOS transistor to base and emitter of the bipolar transistor the gain of the photodetector can be changed. By using another bipolar transistor in a darlington structure the current gain can be boosted once more. However, the dynamics of such a structure significantly degrades at low light levels. The main advantage of using this structure is that the output current can be limited to a range of a few decades, by activating both bipolar transistors at very low light intensities and inactivating both transistor at higher intensities. The cross section of this structure is shown in Figure 2.23.
Figure 2.22: Non-overlapping receptive fields.

Figure 2.23: Ward-Syrzycki’s multisensitivity photodetector.
2.18 Wu & Chiu’s 2D Silicon Retina

This silicon retina subtracts the original image from a spatially smoothed version of the image, therefore it eliminates the local average and enhances edges. This retina, however, uses “well” resistance in the resistive network used for smoothing operation [Wu and Chiu 95, Wu and Chiu 92]. Each cell consists of two parasitic phototransistors; one which is properly isolated from other transistors, and one whose base is shared with its neighboring phototransistors of the same type. The photogenerated electron-hole pairs in the latter can therefore diffuse to its neighbors, resulting in a smoothing function. The simplified layout of four cells of this retina is shown in Figure 2.24.

Due to the large stray capacitance and resistances associated with the base of sharing phototransistors, the smoothing operation will have a delay. Hence, a crude motion detection is also obtained.

The major drawback of such a retina is the uncontrollability of both the smoothing constants and the delay, which heavily depend on the process used, the size, and the shape of the well regions being shared by the sharing phototransistors.

A 32×32 array of this silicon retina has been fabricated in a 0.8μm DPDM CMOS process. Each cell occupies an area of 60μm×60μm.

![Figure 2.24: Simplified layout of Wu-Chiu’s retina.](image-url)
Lateral inhibition is one of the overlapping mechanisms used by many creatures to extend their visual capabilities without requiring additional processing by the brain. Nilson et al.'s shunting inhibition vision chip designed by Nilson et al. implements a shunting inhibition network using current-modes circuits. Current multiplication is performed by a four-quadrant Gilbert multiplier. The schematic of the cell is shown in Figure 2.17. This vision chip, which uses a 0.7um-p-well CMOS process, has a one-dimensional array with 24 cells.
2.20 Keast & Sodini’s CCD/CMOS Imager and Processor

The 2D CCD/CMOS imager and processor designed by Keast and Sodini [Keast and Sodini 92, Keast and Sodini 93, Keast and Sodini 90] combines the CCD imager and CMOS processing blocks to perform smoothing and segmentation on the input image. A simplified schematic of several cells of the chip are shown in Figure 2.26. Each cell comprises a CCD photodetector, and several other CCD elements. A CCD/CMOS circuit also computes the difference between two neighbors. If the difference does not exceed a predefined threshold, the gate of a mixing CCD device (shown in green color) is connected to a proper clock to perform a smoothing operation. Otherwise, the gate of the mixing CCD device is connected to another clock, as a result of which no smoothing is performed. Calculating the difference and comparing it with a threshold is done using the absolute value of difference circuit (AVD) described in section 4.2. The smoothing operation is performed by a fill-and-spill CCD circuit described in section 3.12.

The fabricated chip comprises a 4×4 array on cells. It is designed in a 2μm buried channel CCD/CMOS process.
Figure 2.26: The architecture of Keast-Sodini’s CCD/CMOS segmentation chip.
2.21 Mitsubishi Electric’s CMOS Artificial Retina with VSP

In conventional methods modulating an image with a desired kernel is performed after the photodetection stage. In Mitsubishi’s CMOS retina chips, the photocurrent output from photodetectors is directly modulated, therefore obviating post-detector processing [Funatsu et al. 94, Funatsu et al. 95a, Funatsu et al. 95b, Funatsu et al. 96]. GaAs based variable sensitivity photodetectors (VSP) have also been used in retina chips based on this concept [Nitta et al. 95, Lange et al. 93] (See section 3.1).

The circuit of a pixel is shown in Figure 2.27. The output of the photodetector determines the bias current of the OTA. The output current of the OTA depends on the input differential voltage and the bias current.

\[
I_{\text{out}} = I_{\text{bias}} \tanh \frac{\Delta V_{\text{in}}}{2V_{\text{tanh}}} \quad \text{in subthreshold}
\]

\[
I_{\text{out}} = \beta \Delta V \sqrt{\frac{2I_{\text{bias}}}{\beta} - \Delta V^2} \quad \text{in above-threshold}
\]  

(2.3)

Therefore a simple and effective modulation of the input image can be obtained. The read-out mechanism selects several pixels in a column at the same time and applies proper voltages to the input of the corresponding OTAs, thereby convolving the image with a kernel. As the applied voltages can be selected arbitrarily, a vast number of image processing tasks, such as edge detection and smoothing, can be performed using this retina.

The fabricated chip has 256×256 pixels of VSP detectors with dimension of 35\(\mu\)m × 26\(\mu\)m, in a 2\(\mu\)m 1P-2M NMOS process.

![Figure 2.27: Pixel circuit of a MOS VSP detector.](image-url)
2.22 Venier et al.'s Solar Illumination Monitoring Chip

In order to detect the azimuth, elevation and intensity of the sun, Venier et al. have designed an analog chip [Venier et al. 96]. Pixels in this chip are located on a linear-polar coordinate (See Figure 2.28). At each pixel the photocurrent is compared with the global average and if the intensity at the pixel is high (exposed to the sun), the pixel will output currents at the radial and angular directions, and a copy of the input current to determine the intensity, as shown in Figure 2.29. The angular currents from pixels on the same angle, and the radial currents of the pixels on the same polar coordinate are summed and input to the “center of gravity” circuit, which is a linear diffusive network (See section 2.6) [Andreou et al. 91a, Tartagni and Perona 93, Vittoz and Arreguit 93]. The currents at both ends of the network are read, and the relative center of gravity of the radial and angular currents is found by

\[
X = \frac{I_{\text{right}} - I_{\text{left}}}{I_{\text{right}} + I_{\text{left}}} 
\]

The chip has been fabricated in a 2μm CMOS process in an area of $5.5 \times 5.5 \text{mm}^2$. There are 1365 cells each occupying an area of $95 \times 825 \text{mm}^2$.

![Figure 2.28: Pixel interconnection in Venier’s chip.](image-url)
Figure 2.29: Pixel circuit in Venier’s chip.
Chapter 3

Spatio-Temporal Image Processing
Vision Chips

3.1 Introduction

In pure spatial processing vision chips only image enhancement, edge detection, or other stationary visual tasks are of concern. Spatio-temporal vision chips are concerned with the time dependent features of the image. Although spatial vision chips have shown a degree of robustness for operation under different lighting environments, there are still no claims about the robust operation of spatio-temporal vision chips.

The complicated nature of motion detection for VLSI implementation is from the fact that in almost all models some form of delay or storage element is required. Most algorithms also require inputs from several time frames. Storage or delay elements, apart from being very area consuming, are difficult to implement. Another reason for the less robust operation of motion detectors is the required temporal contrast for reliable motion detection. The temporal contrast of objects in real scenes is relatively small, and can hardly trigger many analog motion detection chips. The unsatisfactory results from most motion detector chips is driving recent implementations towards intuitive but robust solutions with some structural deviations from the original models.

Algorithms that have been devised for motion detection chips are in two main categories: biological, and computational. Some early implementation were based on the optic-flow theory, which belongs to the computational category. Due to complexity and inherent problems in this theory, however, no recent motion detection chips have been based on this model. In fact all computational algorithms for motion detection are very complex and only a few motion detection chips are based on these models. On the other hand, biological models, for example Reichardt’s correlative motion detector, offer a simple structure which is VLSI friendly. Therefore, a large number of vision chips have adopted these models, or modified versions of them.

It should be mentioned that subtracting two frames of the image, though is a temporal processing function, cannot be considered as motion detection, which is implied at least for detecting optical flow in time and in space.
3.2 Lyon’s eye

Richard Lyon’s optical mouse [Lyon 81a, Lyon 81b, Lyon and Haeberli 82] is one of the earliest smart vision sensors designed and implemented on silicon. Apart from the specific design approach used for the optical mouse, Lyon also pointed out some of the very fundamental aspects of architectural design methodologies for VLSI smart sensors, i.e. using very simple and conservative device models, design rules, analog and digital circuits, and timing techniques. This methodology has been considered in almost all the analog VLSI chips comprising large arrays of analog circuits to date, and is not going to change unless VLSI processes mature to a highly uniform and reliable level.

Lyon’s eye is basically a digital motion detection chip. Photodetection is performed using diffusion junction diodes in p-substrate in an NMOS process. Photocurrent is integrated over time on a capacitor, and transduced to a digital signal by a simple inverter. The time that takes for the photocurrent to charge up the input capacitance depends on the light level. Therefore, if the neighboring cells laterally inhibit each other, a cell exposed to a brighter pattern will have a “high” state earlier than other cells. Thus, it can inhibit neighboring cells from firing to a high state. By tracking the location of winner cells at consecutive times, the movement of the input image, which is the fixed pattern of the mouse pad, can be determined. The references discuss the characteristics of the lateral inhibition when used with different neighborhood coverage radii. The algorithm used for detecting motion is intuitive and simple, and is based on tracking the particular pattern of the mouse pad.

The chip was designed in a 5μm NMOS process. A 4×4 array of photodiodes, digital circuits for lateral inhibition, and other digital circuits for interfacing were implemented in an area of 3.5mm×3.5mm.

Figure 3.1: Microphotograph of Lyon’s eye.
3.3 Tanner and Mead’s correlating motion detection chip

This vision sensor implemented in NMOS inherits many design ideas used in Lyon’s chip [Tanner and Mead 84]. It uses correlation to determine the direction of motion. Phototransduction is performed by an integration based photocircuit using a photodiode. The image at one sampling time is digitized to a one bit image pattern and stored in latches. It is then correlated with the analog signal detected in the next sample using the architecture shown in Figure 3.2. The digitization of the image to one bit in the latch branch is done for simplifying the design. Multiplication is performed using a simple current mirror switched by the output of the latch. The currents are summed by hard-wiring the outputs of all current mirrors.

A one dimensional array of this motion detector chips has been fabricated using a 4µm NMOS process in a 5.7mm × 1.73mm die.
Figure 3.2: Architecture of Tanner and Mead’s correlating motion detector.
3.4 Tanner and Mead’s optic flow motion detection chip

This 2D motion detection chip designed by Tanner and Mead [Tanner and Mead 88, Mead 89b] globally solves the optic flow equation

\[ \frac{\partial I}{\partial t} = -\frac{\partial I}{\partial x} v_x - \frac{\partial I}{\partial y} v_y \]

by feeding back the error

\[ e = D \frac{\nabla I}{|\nabla I|} \]

where

\[ D = -\frac{\left(\frac{\partial I}{\partial x}\right) v_x + \left(\frac{\partial I}{\partial y}\right) v_y + \frac{\partial I}{\partial t}}{\sqrt{\left(\frac{\partial I}{\partial x}\right)^2 + \left(\frac{\partial I}{\partial y}\right)^2}} \]

As is seen the algorithm used in the implementation of this chip is based on purely mathematical formulation of the optic flow derived by Horn and Schunck [Horn and Schunck 81] and modified by Hildreth [Hildreth 85].

The chip contains phototransistors, and temporal and spatial differentiation circuitry for computing the spatio-temporal gradients of the input image, \( \partial I / \partial x \), \( \partial I / \partial y \), and \( \partial I / \partial t \).

The spatio-temporal information is collectively computed across the chip. The output of the chip is a global signal indicating the motion flow of the whole image (see Figure 3.3). Therefore, the chip is only capable of reporting a global motion. Figure 3.4 shows the block diagram of each motion processing element in Figure 3.3.

Despite some imperfections pointed out by others [Koch et al. 90], as one of the first vision chips being realized, this chip certainly demonstrated and proved that the low level vision processing based on mathematical algorithms can be implemented in VLSI.

![Figure 3.3: Architecture of Tanner and Mead’s optic flow motion detector.](image-url)
3.5 Moore and Koch’s multiplicative motion detector

Moore-Koch’s multiplicative motion detector chip is just a variation of the Tanner-Mead’s motion detector [Moore and Koch 91]. In the motion detector designed by Tanner the division operation required for dividing the spatial gradients, $\frac{\partial I}{\partial x}$ and $\frac{\partial I}{\partial y}$, by the temporal gradient, $\frac{\partial I}{\partial t}$ has been implemented by a Gilbert multiplier in a feedback loop. In Moore-Koch’s work the feedback loop has been opened and a multiplication of the temporal and spatial components of intensity is obtained instead. As the work doesn’t contain more conceptual and practical information, details are referenced to [Moore and Koch 91]. Nonetheless, the design illustrates that even with completely different functions than what the optic flow theory suggests, still some motion information can be obtained.
3.6 Bair and Koch’s motion detection chip

The motion detection chip designed by Bair and Koch [Bair and Koch 91b] is basically a zero-crossing edge detector. The chip consists of two layers of spatial smoothing resistive networks with different smoothing constants (see Figure 3.5). A comparator computes the difference of the outputs of these layers. The output of the comparator simulates an approximation to the Laplacian of Gaussian (LOG) function with a mexican-hat characteristics. The output of the comparator is then provided as current to the zero-crossing detector circuit shown in Figure 3.6. Motion detection is performed off-chip using a 80286 single board computer by tracking the position of zero-crossings.

![Block diagram of the Bair-Koch’s LOG implementation using two resistive smoothing networks.](image)

**Figure 3.5:** Block diagram of the Bair-Koch’s LOG implementation using two resistive smoothing networks.

![Schematic diagram of Bair-Koch’s zero-crossing detector.](image)

**Figure 3.6:** Schematic diagram of Bair-Koch’s zero-crossing detector.
3.7 Delbrück’s focusing chip

The focusing chip designed by Delbrück [Delbrück 89] is inspired by the accommodation capability of human visual system and focuses an image onto itself by controlling its distance from the lens. Delbrück briefly describes some of the interesting features of accommodation capabilities of human eye, and explains the fact that accommodation can only be done when the image or eye are moving. In the algorithm used by Delbrück measures of sharpness, $s$, and the accommodation state, $l$, are combined together using the $\tanh$ function of a Gilbert multiplier to obtain the force component $\tanh(s l)$. This feedback force is then used in the dynamical equation of the lens control systems. The accommodation state, $l$, is derived from the lens velocity. The computation of the sharpness, $s$, is the primary job of the chip.

The schematic diagram of the chip is shown in Figure 3.7. The front end is the same as Mahowald–Mead’s retina circuitry. The circuit computes the difference between the signal of a photodetector and the smoothed signal. When the image is originally out of focus an edge will be less different from its spatially smoothed version. When the image is focused onto the chip the difference will have its maximum value. In order to get the best results it is better to look at locations where the difference for a specific image, whether focused or out of focus, is already maximum. Intuitively, sharpness can be determined more easily on edges. The MAX circuit in fact finds the maximum difference between the input and the smoothed input. The MAX circuit is a simple variant of WTA (Winner-Take-All) circuits.

In the chip 40 retina cells have been fabricated in a 2$\mu$m CMOS process. Each pixel is 165$\mu$m wide.

![Figure 3.7: Schematic of Delbrück’s focusing chip.](Image)
3.8 Delbrück’s velocity tuned motion sensor

The correlation based motion detector of Delbrück [Delbrück 93c] unlike some other motion detection systems, which depend on spatial and temporal differentiation, uses correlation to extract motion information, and hence claims to be more robust than other motion detectors. The structure of the chip is depicted in Figure 3.8. The signal from a detector is delayed by the delay element and is compared with a reference level. As indicated in the figure there is only one sensitive direction. Motion in the other direction may be detected by another delay-and-compare element in the other direction. The chip has an array of photodetectors, time delay elements, and antibump circuits.

The outputs from delay elements are collectively mixed with the outputs of the photodetectors through capacitive coupling. When a part of image moves at a velocity that matches the delay time of the delay elements and the spatial separation between adjacent photodetectors, the signals at each stage get constructively added together and grow in size as they pass through the array.

A difference between this approach and the elementary motion detection (EMD) methods is that in EMD methods only two contiguous cells take part in the motion detection process, but in Delbrück’s approach all the previous cells affect the output of the motion detection.

Delbrück has implemented a hexagonal 2D [Delbrück 93c] motion sensor. Photodetection is performed by a novel adaptive photodetector circuit shown in Figure 3.9. The output of the motion detector is obtained from an antibump circuit, which provides a nonlinearity function [Mead 94, Delbrück 93c, Delbrück 93b].

The 2D chip contains 26×26 cells in an area of 6.9mm×6.8 mm, using a 2μm CMOS process.
Figure 3.8: Top: the architecture, bottom: the schematic diagram of a cell of Delbrücks’s motion detector.

Figure 3.9: The circuit diagram of the adaptive photodetector. The shaded box contains the adaptive element.
3.9 Meitzler et al.'s sampled-data motion chip

Meitzler et al. [Meitzler et al. 93] describe a 1D motion detection chip similar to Reichardt correlation-based motion detectors. Instead of using a delay element as in Reichardt model, Meitzler has used sample and hold to delay the signal. The reason in using sample and hold has been stated to be the need to integrate the output of the motion detector, which would have otherwise saturated due to offset voltage in the delay stage. The architecture of the sample and hold motion detector is shown in Figure 3.10. The front-end of the chip uses Andreou’s retina cells, which is described in Section 2.6 [Andreou and Boahen 94b]. The sample and hold circuit is based on Vittoz et al.’s circuit described in [Vittoz et al. 91]. This S&H circuit has a relatively long (in the order a few minutes) retention time. It should also be noted that the multiplication required in Reichardt’s model has been replaced by an absolute value of difference function.

The chip has been fabricated in a $2\mu m$ CMOS process in a “Tiny” chip (2.2mm ×2.2mm). It contains 22 cells, each occupying an area of $365 \times 77 \mu m^2$.

![Diagram](image_url)

Figure 3.10: Meitzler et al.’s sample and hold motion detector.
3.10 Moini et al.’s insect vision-based motion detection chip

The Moini et al.’s insect vision chip is a biologically inspired motion sensor [Moini et al. 93, Moini 94, Yakovleff et al. 93]. It is an implementation of the template model [Horridge and Sobey 91] for insect vision.

A simplified model of insect visual system is illustrated in Figure 3.11. The insects neuro-optical system is composed of three layers: lamina, medulla, and lobula or lobula complex. Lamina consists of photodetectors and automatic gain control circuitry. Medulla contains the small spatial field motion detectors, in addition to many other complicated functions. The primary wide-field motion computation is located in lobula complex. Lobula plate (a part of the lobula complex) is also characterized by large directionally sensitive motion detection (DSMD) neurons. In the template model the motion information is obtained by thresholding the temporal gradient of the intensity, $\frac{\partial I}{\partial t}$, at each pixel. The resulting output indicates three states: Increase, Decrease, and No-Motion, which can be coded using two digital bits. This output is then sampled and stored. Templates are formed by collating the outputs of two contiguous cells and at two consecutive sampling instants. The templates are then coded to represent low level motion information. The rest of the processing which involves tracking of some specific templates is done using six tracking engines. The location of the tracked templates are reported off-chip.

The chip has a 1D array of 64 photoreceptors, followed by a differentiator shown in Figure 3.13. It also contains RAMs for storing the templates and final results only for interfacing purposes. Six search-and-track engines have also been implemented which operate on specified areas of interest in the image. The chip has been fabricated in a 2μm CMOS process in an area of 4.5mm×4.6mm. The detectors and analog processing elements only occupy 1.8mm×0.6mm and the rest is dedicated to digital processing modules.
Vision Chips or Seeing Silicon

Figure 3.11: A simplified block diagram of insect visual system used in Moini et al.’s insect vision chip.

Figure 3.12: Formation of templates in Moini et al.’s insect vision chip.

Figure 3.13: Schematic of Moini’s differentiator.
3.11 Moini et al.’s second insect vision-based motion detection chip

This second implementation of template model is architecturally similar to the first implementation described in section 3.10. This chip, however, has several special circuits built into it, in addition to more flexibility and testability features included. The chip architecture is shown in Figure 3.14. The chip has two 1-D arrays of 64 photoreceptors. The photodetectors are based on the multisensivity photodetector and can be chosen from two different choices, a simple photodiode and a bipolar transistor (See section 7.2.1).

The next processing stage is multiplicative noise cancellation (MNC), where the signal in one channel is divided by the local spatial average. In addition to cancelling multiplicative noise from light sources reflected from the surface of objects, this operation performs an edge enhancement and normalization operation. The spatial averaging and division are realized using the circuits shown in Figure 3.15.

The final stage is the differentiation. The new differentiator uses a two-mode transconductance amplifier in its feedback loop. The OTA can be configured to either an Early effect mode or a simple five-transistor OTA. The circuit is shown in Figure 3.16.

The chip has been fabricated in a 1.2μm CMOS process in an area of 2.2mm × 2.2mm.
Figure 3.14: The architecture of the second implementation of Moini et al.’s insect vision based motion detection chip.
Figure 3.15: Schematic of the spatial averaging and division circuits used in the MNC operation.

Figure 3.16: Schematic of Moini’s differentiator with two different operating modes.
3.12 Dron’s multi-scale veto CCD motion sensor

The motion detector designed by Lisa Dron [Dron 93, Dron 94, McIlrath 96] is essentially an edge detection chip. It uses CCD devices for image acquisition and spatial and temporal processing of the image. An algorithm called multi-scale veto (MSV) has been used in the implementation. In MSV a sequence of spatial smoothing functions are applied to the image. The smoothing filters have increasing spatial span. An edge is identified when the difference between contiguous pixels on the edge can pass a threshold level for all the spatial filters. The main difference between MSV and other methods, like zero-crossing edge detection, is that the computation of the second spatial differentiation, i.e. $\nabla^2 I$, is not necessary for MSV.

The implementation of multiple spatial filters is facilitated by clocked CCD circuits. The image information is stored in potential wells, and at each processing cycle is passed through a CCD charge redistribution network, shown in Figure 3.17, which performs a simple spatial averaging. By repeating the cycle, the smoothing function acts over more pixels and widens.

The final fabricated chip contains $32 \times 32$ detectors on the MOSIS "LARGE" die size of 7.9mm $\times$ 9.2mm. The size of each cell is 224 $\times$ 224 $\mu m^2$. A significant area is dedicated to routing the clock signals required for operating the CCD circuits. A few malfunctioning reported in [Dron 94] have been associated with the poor CCD process used in the implementation.

![Figure 3.17: The charge redistribution circuit using CCD elements used in Dron’s MSV motion edge detection chip. From top to bottom are the snapshots of the operation.](image-url)
3.13 Horiuchi et al.’s delay line-based motion detection chip

This motion detection chip is based on an extended delay-based motion detection algorithm [Horiuchi et al. 91]. The signal from each photoreceptor is passed through a firing neuron which fires on the detection of a specific feature (the rising edge of the signal from photodetectors). The outputs of the firing neurons are then passed through a series of delay lines, and are compared with the delayed signals from neighboring cells. The signals from neighboring photoreceptors are delayed in opposite directions as shown in Figure 3.18. When two contiguous neurons fire, their output signals race through the delay lines towards each other and meet somewhere on the line, where a large signal will be created by the correlating circuits (the circles in Figure 3.18). The meeting point will be detected by the winner-take-all circuit. If the neurons fire at the same time, the signals will meet in the middle of the delay line. Positive or negative motion can be detected by looking at the displacement with respect to the middle point.

Although the method seems promising as a multi-velocity (compared with Delbrück’s single velocity tuned motion detector described in Section 3.8), due to the aggregation of the output signals of the correlators, it can only detect a uniform optic-flow across the 1D array. In fact it can only report a single global velocity vector, and is not able to locate multiple objects moving at the same or different velocities.

The fabricated chip contains 28 photodetectors.
Figure 3.18: Architecture of Horiuchi et al.'s motion detection chip.
3.14 Chong et al.’s change detector

Although Chong et al.’s chip ([Chong et al. 92]) detects intensity changes over time, it does not attempt to implement any algorithm of vision, and the only operation performed on the chip is analog temporal differentiation of photocurrents. The chip, however, utilizes a compact current mode circuit for differentiating the photocurrents and generating a pulse on the occurrence of an increasing or decreasing light intensity. The current mode differentiator is shown in Figure 3.19. When the input light intensity decreases, the voltage at the node Out goes down. The negative feedback loop will eventually fix the operating point of the circuit at a point where \( I_{\text{photo}} = I_{\text{feedback}} \). In the mean time a voltage pulse will be detected at the output. This voltage pulse is converted to current and read out through a x-y switch network.

There are two issues that the circuit faces with. Firstly, the circuit can only detect decreases in the intensity. This is because of using the simple inverting amplifier consisting of M1 and M2, instead of a conventional OTA. This can be resolved by using a 5-transistor OTA. An increase in the input of the delay element increases the current drive of M2 which can quickly charge up the capacitor. The second problem is that the circuit is conditionally stable. It can easily be shown that in order to stabilize the circuit, the biasing current, \( I_{\text{bias}} \), should be very small.

The implemented chip contains 25×25 cells. In the 3μm CMOS process a cell area of 22000μm² is achieved.

![Figure 3.19: Chong’s current mode differentiator.](image-url)
3.15 Gottardi and Yang’s CCD/CMOS motion sensor

In [Gottardi and Yang 93] Gottardi and Yang describe a 1D linear CCD image sensor and correlator. The outputs of the 1D imager are stored in two analog CCD memories, each storing one frame of the image. In the next stage the data from one frame is shifted by 5 pixels in the preshift block. In the correlator block the frames data are shifted spatially and correlated pixel-wise. Pixel correlation is performed by an absolute value of difference (AVD) circuit. The outputs of the AVD circuits (which are in the form of charge) in one row are added together. A winner-take-all circuit determines the position of the largest correlation among the eleven outputs of the correlator block. The block diagram of the chip and the architecture of the correlator are shown in Figure 3.20.

![Block diagram of the chip.](image)

![Architecture of the correlator in Gottardi-Yang’s CCD motion sensor.](image)

Figure 3.20: a) Block diagram of the chip. b) Architecture of the correlator in Gottardi-Yang’s CCD motion sensor.
3.16 Kramer et al.’s velocity sensor

Kramer et al.’s vision chip is a 1D motion detector [Kramer et al. 95]. The algorithm used in the implementation is rather intuitive, though the basic element used is very similar to elementary motion detection (EMD) units. The architecture of the motion detection unit is shown in Figure 3.21. I represents the photodetectors which use the Delbrück’s adaptive photocircuit [Mead 94], E, is the edge detection circuit, P is a pulse shaping circuit, and M is the motion computation element. The edge detection circuit E, which is shown in Figure 3.22, issues an output current pulse on the occurrence of an increase of the input light intensity. The output of the edge detector is input to the pulse shaping circuit P, shown in Figure 3.23, and two voltage pulses are generated. One of them, $V_{fast}$, basically follows the shape of the input current spike, while the other one, $V_{slow}$, only follows the increasing edge and has a long decaying tail.

The motion detection unit M samples the $V_{slow}$ output from one channel by the $V_{fast}$ of the other channel. If there is no motion, the $V_{slow}$ outputs will be low. If there is a motion, for example from detector 1 to detector 2, the output of unit $M_2$ will detect a none-zero value. The value of this voltage will be higher, if the time between the start of the decay of $V_{slow1}$ and the $V_{fast2}$ spike is shorter.

The chip has been designed and fabricated in a 2$\mu$m CMOS process. Each cell takes an area of 50,000$\mu$m$^2$. The chip has shown some degree of robustness to contrast and light level variations.

![Architecture of Kramer et al.’s motion detection system.](image)
Delbruck’s Adaptive Photoreceptor

Bias

$V_{g}$

$V_{out}$

W–OTA

Temporal–Edge detector

Figure 3.22: Kramer et al.’s temporal edge detection circuit.

Figure 3.23: Circuit to generate a fast and a slowly decaying signal.
3.17 Indiveri et al.’s time-to-crash sensor

The time-to-crash sensor of Indiveri et al. is a demonstration of how a neuromorphic vision chip can be used for real-life applications [Indiveri et al. 96a, Indiveri et al. 96b]. The chip uses the Gauss’s theorem, which states that the surface integral of the divergence of a vector field over a surface is equal to the line integral of the normal vector field at the line boundaries of the area. For a camera moving at a constant velocity, the velocity field vector is linear and divergence is constant. Therefore, by integrating the velocity vectors at the boundaries of a circle the time-to-crash can be estimated using the Gauss theorem by

\[ TTC = \frac{N \cdot R}{\sum_{k=1}^{N} v_k} \]

where \( N \) is the number of elements around the circle, \( R \) is the radius of the circle, and \( v_k \) are radial velocity components at the elements.

The prototype time-to-crash sensor comprises 12 motion detector elements around a circle as shown in Figure 3.24. The motion detectors generate two positive and negative outputs. To avoid aliasing, the smaller value is set to zero. The outputs are then summed over the entire circle for positive and negative velocities. The two values are finally subtracted to give an indication of the time-to-crash.

The sensor has been implemented in a 2µm CMOS process in a TINY chip (2.2mm×2.2mm). The radius of the inner and outer circles that the photodetector are located are 400µm and 600µm, respectively.
Figure 3.24: Indiveri’s time-to-crash sensor.
3.18 Indiveri et al.'s direction-of-heading detector

In order to determine the focus of expansion (FOE) or direction of heading (DOH) Indiveri et al. have implemented an analog VLSI chip which computes heading direction [Indiveri et al. 95]. In principle FOE is a point where all optical flow vectors intersect, of course for a stationary scene and with translational motion only. Therefore, if the optical flow is determined, simple algorithms may find the FOE. A simple way, which is used in this implementation, relies on the fact that the direction of optical flow changes in FOE point. In other words, FOE is the zero-crossing of optical flow. A prototype chip has been designed, which incorporates the required functional blocks. Architecturally, the chip contains an array of motion sensing elements followed by two separate layers of nonlinear lateral inhibition circuits for positive and negative velocity values. The outputs of the nonlinear LI layers are spatially smoothed using Boahen-Andreou’s diffusive network (See section 2.6). A correlation circuit based on Delbrück’s bump circuit [Delbrück 91] finds the correlation between the output of the positive channel with the output of the neighboring negative channel. Hence, a measure of the zero-crossing, and the steepness of the curve is found. As due to nonideal effects there may be several zero-crossings a winner-take-all circuit selects only one of such zero-crossings. The simplified architecture is illustrated in Figure 3.25.

The chip, which contains a 24-element 1-D array, has been implemented in a 2μm CMOS process.

![Diagram of Indiveri's time-to-crash sensor](image-url)
3.19 McQuirk’s CCD focus of expansion estimation chip

McQuirk has designed a CCD/CMOS chip for estimating the focus of expansion (FOE) [McQuirk 96a, McQuirk 96b]. The algorithm which has been implemented in the chip tries to minimize the depth $Z$ with constraints on the value of the translation vector $t$. In this process points in the image which have a higher temporal variation are given smaller weights and points which are steady or have small variation are given large weights, as these points are more likely to be the FOE point. Solving the derived equations requires pixel interaction from the whole image, which is not feasible. Also the large number of variables to be calculated at each pixel inhibits a fully parallel analog VLSI implementation. Therefore, a discrete-time iterating method (which can easily be implemented in CCDs) is used for solving the equations.

The chip comprises a two-dimensional array of CCD interline imagers, and a one-dimensional array of analog signal processing elements. The interline imager can provide two image frames at the same time, so that temporal gradients of the image can be calculated. The output of the imager array is scanned and passed through floating-gate CCD read-out circuits. Then CMOS circuits are used for implementing various functions required in the processing. The position encoder provides the position of the current pixel. The estimate of the FOE is calculated off-chip and is fed back to the processing array.

All the circuits are designed using differential current-mode techniques in the above-threshold regime. The chip architecture is shown in Figure 3.26.

The chip has been designed in a 2μm CCD/BiCMOS process from Orbit. The chip has $64 \times 64$ pixels with a size of $108\mu m \times 108\mu m$. This large pixel size has been determined by the pitch of the analog processing array.

![Figure 3.26: Architecture of McQuirk’s FOE chip. Using the CCD shifter at the left the image from the imager can be read out, and also synthetic images can be applied to test the analog processing array.](image-url)
3.20 Gruss et al.'s range finder

Gruss et al.'s sensor is a range finding chip for parallel light-stripe range detection [Gruss et al. 91]. In light-stripe range sensing methods a stripe of light (usually laser light) is projected onto the scene. The location of the projected stripe on objects, detected by the detector, depends on their range from the projector and the detector. If the geometry of the projecting instrument and the imager is known, range information can be extracted from the location of the reflected stripes in the image. Of course, to obtain a complete range map of the scene the process of projection-and-measurement should be repeated to the desired and achievable resolution. Further details of the method are referred to the reference [Gruss et al. 91].

In the VLSI implementation of this method, the presence of the light stripe is detected by circular photodetectors (circular detectors are not sensitive to the rotation of stripes). When the light stripe passes over a photodetector, a current pulse is generated. The position of the pulse, in time, gives a measure of the range (knowing the geometry of projection and detection equipment). The temporal position of the pulse is stamped by a threshold-and-stamp circuit illustrated in Figure 3.27. Using this circuit the output of the chip can be easily associated with the range information.

Two chips have been designed using the same circuit. A $6 \times 6$ array and a $28 \times 32$ array. Each cell including the photodetector and processing circuitry occupies an area of about $250 \mu m \times 250 \mu m$, of which a quarter is used by the circular photodiode.

![Figure 3.27: The time-stamp circuitry.](image-url)
3.21 Sarpehskar et al.’s pulse mode motion detector

Sarpeshkar et al.’s motion detector described in [Sarpeshkar et al. 93] is an implementation of Reichardt’s model for motion detection, in which the signal from one input channel is delayed and correlated with its neighbors. A zero-crossing edge detector (See section 3.6) finds the edges. The edges pertaining to positive and negative gradients (LE and RE) are separated and applied to separate motion detection units (See Figure 3.28).

The motion detection unit has the generic DSMD (directionally selective motion detector) structure, with two EMD (elementary motion detector) units and a subtracter. Two major building blocks used in this chip are shown in Figure 3.29. The circuit in Figure 3.29-a is the schematic of the DSMD. The signal from two neighbors are passed through two branches of pulse shaping circuits, one of which introduces a delay to the pulse (the branch with D and P boxes). The comparison of a signal from the delayed signal of the neighbor is done using a NAND structure. The outputs of the two NAND structures are subtracted using a simple current mode subtracter, from which an output current pulse is produced depending on the direction of motion.

The pulse shaping circuit shown in Figure 3.29-b generates an output pulse which is delayed by a value depending on the biasing current $I_D$ and the capacitance $C_2$.

![Figure 3.28: Architecture of Sarpehskar et al.’s motion detector.](image)
Figure 3.29: a) Schematic of a DSMD. b) Circuit diagram of a pulse shaping circuit. $V_L$, $V_D$, and $V_R$ determine the threshold, pulse width, and an unusable period after pulse generation. For proper operation $IR < ID$. c) Shape of the output waveform. Note that if another input pulse comes before the period $TR$ ends, a different duration of $TD$ for the next pulse will be obtained. Also if $IR > ID$, the circuit becomes oscillatory.
3.22 Meitzler et al.’s 2D position and motion detection chip

This vision chip is one of the examples of multifunctional vision chips, in which several tasks are performed within the chip. The core of the chip is composed of a 2D array of Andreou-Boahen’s retina cells [Boahen and Andreou 92, Andreou and Boahen 94b]. Two one-dimensional motion detection and two one-dimensional centroid computation arrays bear the computation task of the chip. The motion detection is performed by the circuit introduced in section 3.9. Centroid computation is done using DeWeerth’s architecture in section 2.16. The motion and centroid function are performed only on the central row and column of the retina array. This is done by sampling the central row and column and holding them in a sample & hold circuit described in [Vittoz et al.]. As the Andreou-Boahen’s retina intrinsically removes the centroid information, an offset is added to the output of the retina before applying it to the centroid computation circuit. The architecture of the chip is shown in Figure 3.30.

The chip has been targeted for sun-tracking in high-altitude balloons. The shape of the sun and its high contrast with the dark sky in an altitude of 35 Km prevents complications due to low contrast or multiple arbitrary shape objects. In fact this chip is an excellent example of very small size custom vision chip for a constrained vision task, which would have otherwise required a large hardware with at least ten times the size and power dissipation.

The 50×50 retina array and the accompanying motion and centroid computation circuits occupy an area of 6.8mm×6.9mm in a standard 2μm CMOS process. The computational core dissipates 17.5 mW.

Figure 3.30: Architecture of Meitzler et al.’s position and motion computation vision chip.
3.23 Aizawa et al.’s Image Sensor with Compression

Aizawa et al. describe an image sensor which comprises sensor level compression [Aizawa et al. 94, Aizawa et al. 95], which significantly reduces the amount of image data to be read out. The compression algorithm is based on conditional replenishment [Jain 89], in which the pixel value is compared with the previously sampled and stored value (See Figure 3.31). If the result of comparison exceeds a threshold, the Activate signal is activated which controls the scanning logic, when the row containing that cell is scanned. The scanning logic bypasses all inactive cells and only reads out the pixel value of activated cells, hence reducing the scanning time.

The compression ration depends on the contents of the image and the frame rate. For very high frame rate applications, ratios of about 100 can be obtained. For normal applications, ratios of about 10 are obtainable. The 2D array of 32×32 elements has been fabricated using a 2µm CMOS process. Each cell occupies an area of 170µm×170µm.

In [Hamamoto et al. 96b, Hamamoto et al. 97] a column parallel architecture for the compression sensor is described. In these new sensors the photodetector and storage elements are separated into two two-dimensional arrays. The processing element, which only occupies one column, is located between the two arrays (See Figure 3.33). This method brings with it several advantages including: increased density and fill factor for the detectors, and reducing the number of processing elements to only one column. This architecture suits well to those algorithms which operate on individual pixels or a neighborhood in the y direction (the direction of the processing element column).

![Figure 3.31: Schematic of a pixel in the Aizawa et al.’s image compression sensor.](image-url)
Vision Chips or Seeing Silicon

Figure 3.32: Architecture of Aizawa et al.'s compression sensor.

Figure 3.33: a) Pixel parallel architecture, with photodetector, storage element, and processing element in each pixel. b) Column parallel architecture, with separated arrays for photodetectors, memory and processing element.
3.24 Hamamoto et al.’s Image Sensor With Motion Adaptive Storage Time

Dynamic range in conventional charge integration-based imagers is limited by the integration time, which is global to all pixels in an imager. In [Hamamoto et al. 96a, Hamamoto et al. 96c, Hamamoto et al. 96b, Hamamoto et al. 97] Hamamoto et al. describe an imager which compresses the image using motion information (similar to the image compression sensor in section 3.23) and at the same time increases the dynamic range by controlling the integration time of individual pixels depending on the saturation status of the pixel.

The schematic diagram of the processing circuit is shown in Figure 3.34. Whenever the temporal change at the pixel exceeds a threshold, or the detector is saturated, a flag is turned on. Consequently, the pixel value is output and the detector is reset. If the flag is not activated the detector continues its charge integration operation until the flag is activated.

Current mode techniques have been used in the design of the processing element. A column parallel architecture has been used for the chip (See section 3.23). The fabricated chip has 32×32 pixels in a 1P-2M 1µm CMOS process. The dimensions of the detector, analog memory, and processing elements are 85µm × 85µm, 85µm × 46µm, and 85µm × 191µm, respectively. The chip dissipates 150 mW of power and has a processing speed of 2µs/row.

![Figure 3.34: Processing circuitry in Hamamoto et al.’s motion adaptive sensor.](image-url)
3.25 Simoni et al.’s Optical Sensor and Analog Memory Chip with Change Detection

In [Simoni et al. 95b, Simoni et al. 95a] Simoni et al. describe an analog memory with some peripheral circuitry for difference detection. Each pixel comprises a photodiode and a storage capacitance. When a particular pixel is selected, first the previous pixel value which is stored in the capacitor is read out, and then the present value is read from the photodiode. A circuit computes the difference between these two values. The architecture and the circuits are based on the switched-capacitor technique. The schematic of a cell is shown in Figure 3.35. It should be noted that the change detection capability of this chip is not based on any algorithms or models.

![Schematic of a cell of Simoni et al.'s motion detection chip.](image)

Figure 3.35: Schematic of a cell of Simoni et al.'s motion detection chip.
3.26 Espejo et al.’s Smart Pixel CNN

In [Espejo et al. 94e, Espejo et al. 94b, Espejo et al. 92, Espejo et al. 93b, Espejo et al. 93a, Espejo et al. 94d, Espejo et al. 94c, Rodriguez-Vazquez et al. 96] Espejo et al. describe cellular neural network-based (CNN) chips designed for image processing. The main advantage of CNN for VLSI implementation is the locality of interconnections. Each cell in a CNN only connects to its nearest neighbors. The modified equation governing the behavior of a CNN is given by:

\[ \tau \frac{dx_i}{dt} = -g[x_i(t)] + D_i + \sum_{j \in \text{Neighborhood}} A_{ij}y_j(t) + B_{ij}u_j \]  

where \( g() \) is a nonlinear term defined by:

\[ g(x_i) = \begin{cases} 
mx_i + m - 1 & x_i < -1 \\
x_i & \text{otherwise} \\
mx_i - m + 1 & x_i > 1 
\end{cases} \]  

\[ y_i = \frac{1}{2} (|x_i + 1| - |x_i - 1|) \]

Different terms in these equations can be easily implemented using current mode circuits. The \( dx/dt \) term has been implemented using a current mode integrator in the feedback loop.

Two special chips with different connections and weights have been designed and fabricated in a 1.6\( \mu \)m 1P-2M CMOS process. The first one contains a 16\( \times \)16 array for detecting connected components (DCC). The chip dimensions are 2.5mm\( \times \)2.5mm, and dissipates 42 mW. Each cell in this chip occupies 118\( \mu \)m \( \times \)96\( \mu \)m. The second chip performs Radon transform on a 16\( \times \)16 image. In this chip the input can be selected from external or internal (optical) sources. The chip operates in sampled-data mode. The chip area is 2.67mm\( \times \)2.68mm, and its power dissipation is 330 mW. Cell dimensions are 121\( \mu \)m \( \times \)124\( \mu \)m.

Both chips use the multisensitivit y photodetector in darlington mode (See Figure 2.23). The circuits operate in the above-threshold region (as opposed to subthreshold) to achieve better mismatch. However, other factors such as power dissipation and transistor sizing have been traded off.

A later design with more flexibility and functionality uses a 0.8\( \mu \)m process with 20\( \times \)22 cells in an area of 30\( \mu \)m\(^2\) [Rodriguez-Vazquez et al. 96]. This chip also uses darlington connected phototransistors. The weights can be loaded into the array. Therefore, the chip can be programmed to perform different functions. It has successfully demonstrated operations such as low-pass image filtering, corner and border extraction, hole filling, and motion detection.
3.27 Moini et al.’s Shunting Inhibition Vision Chip

Shunting inhibition (SI), or multiplicative lateral inhibition, is known to be one of the models of the retina which demonstrates many of its functional behaviors. In order to investigate the properties of a silicon implementation for shunting inhibition, a chip containing feedback and feedforward shunting inhibition models have been implemented [Moini et al. 97a]. This design uses current mode techniques and subthreshold circuits to implement the complete SI equation including the temporal component.

\[ \frac{d\epsilon_i}{dt} = I_i - b\epsilon_i - ke_i(\epsilon_{i-1} + \epsilon_{i+1}) \]  

(3.4)

where \( \epsilon_i \) is the output of cell \( i \), \( I_i \) is the input, \( b \) is a decay factor, and \( k \) is the inhibition factor. The building block of a cell in the feedback SI is shown in Figure 3.36. In the feedforward circuit copies of the input current of the neighboring cells are involved in the inhibition, instead of copies of the output currents. The current mode temporal differentiator circuit is shown in Figure 3.37. The circuit is based on the current delay using an OTA-C element. The multiplier/divider circuit is the same as that shown in Figure 3.15 in section 3.11.

The chip contains several 64×1 arrays of different implementations of the SI circuits. It has been fabricated in a 2P-2M 2\( \mu \)m CMOS process, in an area of 4.6\( \text{mm} \times 6.8\text{mm} \). The height of each cell is 57\( \mu \text{m} \).

![Figure 3.36: Moini et al.’s feedback shunting inhibition circuit.](image-url)
3.28 Etienne-Cummings et al.’s Motion Detector Chip

Etienne-Cummings’ motion detection chip uses a modified Reichardt algorithm [Etienne-Cummings et al. 97b]. In this algorithm, called temporal domain optical flow measurement (TDOFM), motion is detected by locating the zero-crossings and determining their appearance or reappearance. Velocity can be determined by finding the time that an edge disappears at a pixel and then appears at a neighboring pixel. The block diagram of the TDOFM algorithm is illustrated in Figure 3.38.

The input image is originally input to an edge detector circuit by subtracting the image from a spatially smoothed version of the image. The spatial smoothing is performed by a passive resistive network. The output of the edge detector is converted to 1-bit binary signal using a comparator, with its reference point set to the zero level of the edge detector. The binary edges are applied to positive and negative temporal differentiators, which generate pulses on the appearance (positive \( \frac{d}{dt} \)) and disappearance (negative \( \frac{d}{dt} \)) of an edge. A circuit takes the pulses from two neighbors and generates pulses, whose width indicates the time between the appearance and disappearance of the edge. It also integrates these pulses, and the output represents the velocity of the edge. The schematic diagram of the implemented circuit is shown in Figure 3.39.

Two chips with 1x9 and 5x5 arrays have been fabricated in a 2\( \mu \)m CMOS in TINY chips (2.3mm\( \times \)2.3mm).

Figure 3.37: a) Current mode differentiator. b) Current mode divider/multiplier.
Vision Chips or Seeing Silicon

Photodetector

Zero–crossing
Edge detector

Positive & Negative
Edge Motion Signals

Correlator and
Speed Measurement

Figure 3.38: Block diagram of Etienne-Cummings’ motion detection chip.

From edge detection circuit

Figure 3.39: Circuit diagram of Etienne-Cummings’ motion detection chip.
### 3.29 CSEM’s Motion Detector Chip for Pointing Devices

This chip is the first commercial motion detection chip designed for use in computer mice [Arreguit et al. 96a, Arreguit et al. 96b]. Its success is achieved by several simplifying conditions, such as controlling the illumination range, using a binary B&W image with sharp contrast, minimizing the use of analog circuits to basic building blocks, and performing the tasks of motion detection and displacement measurement using 1-bit digital circuits.

The chip in fact detects the global displacement of the image, which is a pattern of dots with random size and position. The block diagram of the pixel circuit is shown in Figure 3.40. The chip first finds the edges ($E_y$ and $E_x$) and the sign of the gradient of the edges ($S$) in the horizontal and vertical direction. As the patterns are binary, edges and the gradient signs detected by simply comparing the output photocurrents of two neighboring cells. In order to reduce the effect of mismatch and noise two comparisons involving the factor $a$ (which is 2 in this hardware implementation) are performed.

$$
\begin{align*}
E_y &= 0 \quad \text{if } I_y > aI_{y+1} \text{ and } aI_y > I_{y+1} \\
E_x &= 0 \quad \text{if } I_x > aI_{x+1} \text{ and } aI_x > I_{x+1} \\
S &= 0 \quad \text{if } I_x > aI_{x+1} \text{ and } I_y > aI_{y+1}
\end{align*}
$$

where $E_y$ and $E_x$ indicate the presence of an edge in the vertical and horizontal directions, respectively, when they are “0”. $S$ is the sign of the gradient. Note that it has a mixture of both $x$ and $y$ components. In fact $S$ is only used as a simple confidence measure to avoid spatio-temporal aliasing from small size patterns, or very fast movements.

The chip then uses the present and previous values of the edge and gradient to detect whether there has been a displacement in either of the four directions. The circuit used for finding the downward displacement is shown in Figure 3.41. Similar circuits are used for other directions.

The output currents of all pixels associated with each direction are summed. An on-chip ADC and some other analog circuits find the global displacement in the $x$ and $y$ directions. The chip also comprises some other digital control and interface circuits for connection to a standard serial port.

The fabricated chip contains 93 photodiodes and 75 pixels of processing elements. It has been fabricated in a $2\mu$m low power, low-voltage CMOS process, in an area of $4.4\times4.3\text{mm}^2$. 

Figure 3.40: Block diagram of the pixel circuit.

Figure 3.41: Displacement detection circuit for the downward direction.
Chapter 4

Analog VLSI Chips for Vision Processing

4.1 Introduction

In addition to many single chip imager and processor, there are several analog chips designed for image processing. In these chips it is assumed that the image has already been acquired using an imager. These chips cannot be regarded as vision chips, but as they present examples of dedicated analog VLSI hardware for implementing vision algorithms, they have been included in this report. There are many analog and digital chips designed for realizing neural network architectures. Some of these chips have been applied to vision algorithms as well. Here, however, we only consider those implementations that have originally been designed for performing a vision task.
4.2 Hakkaranien & Lee’s AVD CCD Chip for Stereo Vision

In [Hakkaranien and Lee 93, Hakkaranien et al. 91] Hakkaranien and Lee describe a CCD/CMOS chip for computing one of the stages in processing stereo vision. In implementing the Marr-Poggio-Drumheller algorithm for stereo vision, four processing steps are required.

1. Enhancing the image and image features
2. Computing the match data for each pixel in the left and right images
3. Computing the inhibitory and excitatory weights for neighbors
4. Selecting the best match according to the weights obtained from the previous stage.

The chip only implements the second step. It comprises a CCD input stage, CCD shift registers, floating-gate output stage for non-destructive read out, absolute value of difference (AVD) circuit, CCD memory for storing the output of AVD stage, and a floating-diffusion output stage. The architecture is shown in Figure 4.1. The main part, i.e. the AVD cells generate an absolute value of difference of two pixels from the left and right images. It is composed of two cross coupled fill-and-spill CCD circuits, shown in Figure 4.2. Considering only the CCD elements connected to VL and VR it can easily be seen that when VR > VL the potential well of the left fill-and-spill circuit retains a charge proportional to the difference of VL and VR, while the right fill-and-spill circuit becomes empty.

The chip has a 40×40 array of match generators. It has been designed using a 2μm CCD/CMOS process in an area of 7.4mm×8.7mm, and dissipates 450mW.
Figure 4.1: Architecture of Hakkarainen-Lee’s vision chip.

Figure 4.2: The AVD cross section showing potential wells when VR > VL.
4.3 Erten’s CMOS Chip for Stereo Correspondence

Stereo correspondence between two images of a scene captured at different angles requires a similarity (or dissimilarity) measure between pixels in each image. Instead of using conventional distance measures, such as Euclidean distance and city block distance, Gamze Erten has used a hardware measure based on Delbrück’s bump circuit shown in Figure 4.3 [Erten 93, Erten and Goodman 96, Erten and Salam 96].

\[ I_{out} = \frac{1}{1 + \frac{4}{\sigma} \cosh(V_1 - V_2)} \]  

(4.1)

Erten shows that this similarity measure gives a better statistical distribution than the other two measures.

This stereo correspondence chip receives two 1D images and finds the disparity. The chip architecture is shown in Figure 4.4. The winner-take-all circuit is followed by a position encoder, which finds the position of the winner cell. In order to increase the confidence in the output of the operation a “confidence circuit” is designed which checks the value of a confidence metric against a threshold. The confidence metric used in the implementation is the division of the value of the winner cell by the sum of all outputs of the bump array. The schematic diagram of the WTA and confidence circuits are shown in Figure 4.5.

The chip has been designed and fabricated in a 2\( \mu \)m 2P-2M CMOS process. It has nine inputs for the right image and nineteen inputs for the left image. The design if fitted in a TINY chip (2.2mm\( \times \)2.2mm). Successful test results have been reported in the references.

![Figure 4.3: Delbrück’s bump circuit for measuring the similarity between two values. [Delbrück 91, Delbrück 93a].](image-url)
Figure 4.4: Architecture of Erten’s stereo correspondence chip. The outputs of the cells in the shaded areas are not used in the comparison operation to avoid edge effects.

Figure 4.5: Top: the WTA circuit with position encoder. Bottom: The confidence circuit.
Chapter 5

Optical Neuro Chips

5.1 Mitsubishi Electric’s Optical neurochip and retina

The optical neurochips and the retina designed at Mitsubishi Electric Co. are based on optical interconnection and processing of the image using special GaAs photodetectors and light emitting diodes (LED) integrated on the same chip [Nitta et al. 92, Nitta et al. 93a, Nitta et al. 93c, Nitta et al. 93b, Nitta et al. 95, Lange et al. 93, Lange et al. 95, Lange et al. 94, Ohta et al. 89, Oita et al. 94, Oita et al. 93]. The chips do not implement any particular vision processing algorithm. However, they utilize interesting features of GaAs variable sensitivity photodetectors (VSPD) and LEDs and combine them in a single chip. Using integrated VSPD and LEDs the optical crosstalk between adjacent channels is significantly reduced, with respect to previously reported multichip optical neurochips.

The photodetectors are designed using metal-semiconductor-metal (MSM) structures. By applying positive or negative voltage to the metal electrodes of the detector its sensitivity can be varied correspondingly and positive or negative output current be obtained. The photodetectors also show an analog memory behavior, which is very useful for synaptic weight storage in the implementation of neural networks.

In the optical neurochips the inputs are an array of analog signals controlling the intensity of light emitting diodes underneath the VSPD elements. Therefore, the photodetectors modulate the internal image with the control voltage.

The GaAs retina chips described in [Lange et al. 93, Lange et al. 95, Lange et al. 94] are based on devices and structures used in the optical neurochips. The input image is detected by VSPDs. The image is then vector multiplied by the photodetector control voltage. The vector multiplication is performed by: selecting the desired pixels, applying the proper kernel (control voltages) to these pixels, and reading the output. The output currents of the VSPD elements in one column are added together. The architecture of the retina chip is shown in Figure 5.1.

Several neurochips with different sizes have been fabricated. Two retinas with 64×64 pixels (with a cell size of 160μ) and 128×128 pixels (with a cell size of 80μ) have also been designed and fabricated. The larger chip fits in a 14.3mm×14.3mm die. Some image processing operations, such as edge detection, feature extraction, and spatial smoothing have been demonstrated using these chips.
Figure 5.1: Lange et al.’s GaAs retina.
5.2 Yu et al.’s optical neurochip

The function of many vision chips may be viewed as modulating the input light intensity using some circuits. By utilizing spatial light modulators (SLM) part of this modulation can be performed by these devices. The general idea is in fact the same as the variable sensitivity detectors (VSD) described in section 5.1. The two concepts are illustrated in Figure 5.2. In SLMs a layer of ferroelectric liquid crystal (FLC) material is encapsulated between the chip and glass cover. The main advantage of “FLC on silicon” over GaAs optical neuro-chips, is its higher contrast and lower cost [Yu et al. 95a,Yu et al. 96a,Yu et al. 96b,Yu et al. 95b].

Schematic diagram of a pixel of the chip designed by Yu et al. is shown in Figure 5.3. X, T, Y, and W are the input, target output, actual output, and weight of the pixel. Using this circuit an iterative delta learning is implemented. The five-transistor OTA computes the required ΔW from the target and actual outputs, and changes the voltage at the LC pad. The LC pad is a square metal from which the input light, X, is reflected and at the same time modulated by the orientation of the FLC crystals, which depends on the voltage applied to the pad.

The chip named SASLM2 has been fabricated in a 2μm CMOS process. It has an array of 64×64 cells with a 160μm pitch.

![Schematic diagram of a pixel of the chip designed by Yu et al.](image)
Figure 5.3: Pixel circuit of Yu et al.'s optical neuro-chip.
Chapter 6

Active Pixel Sensors

6.1 Introduction

There are many APS reported. In a majority of these sensors there is very minor changes in the pixel circuits or the architecture. In this section we only review those sensors which have used standard CMOS processes, or have implemented some extra circuits at the pixel level.

Before reading the description of each of these sensors the reader is encouraged to read the material in section 7.4, where some of the basic concepts and circuits used in the design of APS are described.

6.2 JPL’s active pixel sensors

A part of ongoing research at Jet Propulsion Lab. (JPL) in Caltech has been concentrated on developing imaging devices using CMOS processes for special applications, such as start tracking and large dynamic range and low power astronomical imaging. The pixel circuitry used in almost all of these sensors are described in section 7.4. Here we will have a brief review of these sensors.
6.3 Fowler et al.’s pixel level ADC sensor

Analog to digital conversion in almost all vision chips and APS is performed at the column or chip level. Incorporating ADC at the pixel level is very area consuming. However, for special applications, where the rate of data conversion and pixel read out is very high such an approach may prove beneficial [Fowler et al. 94, Fowler 95]. Although some disadvantages, such as the increased amount of data and the need for image reconstruction from the data, the introduction of high speed clocks running all across the chip, and the introduction of digital noise by these clock signals, can degrade the performance of this sensor.

In order to implement a feasible area efficient ADC for each pixel a one-bit first order delta-sigma ADC, shown in Figure 6.1, has been used in this sensor. The circuit simply tries to reduce the error between the analog output of the circuit and the input by averaging this error through a succession of clock cycles. The number of clock cycles required to achieve a desired signal-to-quantization noise ratio is given by [Fowler 95]:

\[ N = 2^{(\frac{SNR-5.2dB}{2})} \]  

Typically more than 60 clock cycles are required to achieve a SNR of around 50dB. This means that a large quantity of data is produced during a full ADC conversion. The digital values still need to reconstructed using a decimal filter.

The schematic diagram of the pixel circuit is shown in Figure 6.2. A 64×64 array of this circuit has been implemented in a 0.8μm 1P-3M CMOS process. Each pixel occupies an area of 30μm × 30μm. A redesign of this sensor uses multiplexed ADC for every four pixels, and has 128×128 pixels each with an area of 20.8μm × 19.8μm [Yang et al. 96].

![Block diagram of Fowler’s pixel level first order one-bit sigma-delta ADC. The clock rate of the Phi1 and Phi2 is much higher than the frame rate.](image1.png)
6.4 Technion’s Adaptive Sensitivity CCD Imager

Achieving large dynamic range requires adaptation techniques for individual pixels. Global adaptation although improves the dynamic range, is still limited in many situations, where different parts of the image are very bright or very dark. The adaptive sensitivity CCD imager designed by the VLSI group in Technion improves the dynamic range by controlling the integration time of each individual sensor [Chen and Ginosar 95].

The pixel circuit with a set-reset flip-flop (SR-FF) is illustrated in Figure 6.3. The main difference between this photocircuit with conventional photogate-based photocircuit is the addition of the SR-FF.

The sensor is exposed to the light at several different integration times (for example with multiples of 1, 8, and 64). First all the SR-FFs in the pixels are set using the “global set” signal. Then the shortest integration is performed. If a pixel is detected saturated during any integration cycle, or is likely to saturate in the next cycle, its associated flip-flop is reset for the following cycles. Resetting pixels which can get saturated has the additional benefit of avoiding blooming. After the integration cycles are finished the images is read out using interline CCD transfer.

The fabricated prototype chip has 9×16 pixels in a 2μm CMOS/CCD process, in a TINY chip (2.22mm×2.22mm).

Figure 6.2: The schematic diagram of Fowler’s pixel level ADC sensor.
6.5 Technion’s TDI CCD sensor

The ¹ pixel-level adaptive sensitivity technology enables image sensors to acquire wide dynamic range scenes without loss of detail, by adjusting the sensitivity of individual pixels, according to the intensity of light incident upon it. An adaptive sensitivity time delay and integrate (TDI) CCD sensor has been designed by the VLSI Group in Technion [Chen and Ginosar 96].

The sensor comprises 18 TDI integration stages, with a horizontal resolution of 32 pixels. The level of charge integrated in each pixel is monitored as the pixel charge packet progresses across the TDI array. If the charge accumulates to above a certain threshold level, the pixel is discharged. The architecture of the sensor is shown in Figure 6.4. The “conditional” reset mechanisms are inserted after the thirteenth and seventeenth stages. Thus, each individual pixel may be integrated over 1, 5, or all 18 stages. Since in TDI scanning there is no concept of “frames” and each pixel is imaged only once, the intensity sensing and the decision on how long to integrate must be performed “on the fly”. But, while in regular linear sensors the perpendicular fill factor is unlimited and complex control circuits may be placed next to the detectors, the two dimensional nature of TDI sensors presents much more demanding architectural and circuit challenges.

The chip has been fabricated in a 2µm CMOS/CCD process, in a TINY chip (2.22mm × 2.22mm).

---

¹With the permission of authors most of this section has been copied from the abstract of the referenced paper in [Chen and Ginosar 96], as it clearly and concisely describes the sensor.
Figure 6.4: Architecture of Technion’s TDI sensor.
Chapter 7

Designing Vision Chips: Principles and Building Blocks
7.1 Introduction

Detecting the light intensity and transducing it to an electrical parameter (voltage or current), and subsequently processing the signals from an array of detectors, in spatial and/or temporal domain, are the primary tasks of vision chips. Understanding the physical and electronic principles playing role at each stage is of great importance in designing vision chips and developing systems based on vision chips. Although a significant amount of work has been done in this area, there is little literature concerning a systematic approach to the design of vision chips. The bulk of such a literature would consist of principles of phototransduction, spatial, and temporal processing. The present document tries to initiate and inspire such an effort and looks forward to a more established future for vision chips.

This chapter is organized as follows. In section 7.2 the first elements of phototransduction, i.e. photodetector devices are presented. Analytic expressions for the quantum efficiency of different detector structures in standard processes are derived in section 7.2.1. Section 7.3 describes some of the circuits for phototransduction, or photocircuits\textsuperscript{1}. Spatial processing principles, techniques, and circuits are discussed in section 7.5. Temporal and spatio-temporal processing methods are described in section 7.6. Adaptation mechanism for extending the dynamic range of the system are discussed in section 7.7. Some of the problems, such as mismatch and digital noise, in designing vision chips are addressed in section 7.8. Finally, basic circuits and design techniques for active pixel sensors (APS) are presented in section 7.4.

\textsuperscript{1}I have adopted the term “photocircuit” because of its clear and sharp reference to a circuitry which processes the photocurrent or photovoltage. Other terms, such as “photoreceptor”, have been interchangeably used both for single photodetectors and the circuitry used for processing photocurrents, and in a context full of these references become confusing.
7.2 Phototransduction, the Doorway to Vision Chips

Photodetectors are the doorway to vision chips. Any imperfection at this stage, with respect to desired characteristics, cannot be compensated even with a priori knowledge, or may be compensated at a high computational cost. The characteristics of the detectors, such as bandwidth, noise, linearity, and dynamic range directly affect the performance of the system. Therefore, it is highly demanding to have as perfect a photodetector as possible. Unfortunately, there is no flexibility or choice of photodetector devices in standard processes. In more than 90% of the vision chips reported photodetectors have been realized using parasitic elements found in standard processes. Fortunately, these parasitic devices have not put severe limitation on the processing capabilities of vision chips so far. The junction photodiodes, for example, have a linear behavior over a large dynamic range of more than 7 decades, with reasonable sensitivity to visible light spectrum.

The inflexibility of photodetectors may be overcome by design ideas in photocircuits. Static and dynamic characteristics of phototransduction can be improved by clever photocircuits. Active pixel detectors, are a clear example of this idea, which can be regarded as photocircuit-only vision chips, and can be a useful resource in the design of vision chips.

7.2.1 Photodetector Elements

In this section a general view of various photodetector elements (PD), which have been used for vision chips is provided. This view will be limited only to those photodetectors available in standard processes; in other words, only parasitic elements which can be utilized as photodetector. In CCD, CMOS, and GaAs processes there can be found at least one form of junction diode that can serve as a photodetector. CCD processes have more mature forms of photodetectors simply because of the large demand on CCD cameras. CMOS and GaAs, on the other hand, have been paid less attention in the profitable world of imaging. However, the advancement of vision chips in industrial applications, and the very resourceful circuit and device libraries of CMOS processes is going to change this and attract more attention.

In a standard CMOS process, either p-well or n-well, several parasitic junction devices can be used as photodetecting elements, depicted in Figure 7.1. The first three structures are junction diodes, the fourth one is a parasitic vertical bipolar transistor, and the fifth structure is capable of bidirectional photocurrent generation depending on the voltage across the device. The fifth structure can also be considered as a lateral bipolar transistor with symmetric emitter and collector. The last structure is a photogate which in fact has been borrowed from CCD processes. The photogenerated charges are stored in a potential well, produced by applying a large voltage to the gate of the device.

Each of these structures can be analyzed in detail relatively easily. Here we provide analysis for each device, and derive equations for the quantum efficiency as a function of geometrical and metallurgical parameters of them. Note that there are several simplifying assumptions made for each structure. The general assumptions made in all the derivation in the following sections are:

- Abrupt junctions with rectangular depletion regions.
- One dimensional current flow. This would not be true for minimum size devices, where vertical and horizontal dimensions are comparable.
- No high-level injection. This becomes important for very high intensity application, for example for melting furnace or welding inspection.
Figure 7.1: a) well-substrate junction diode, b) diffusion-well diode, c) well-substrate and diffusion-well diodes in parallel, d) vertical bipolar transistor, e) bidirectional photodetector, and f) photogate.
- No degeneration in highly doped diffusion regions.
- No recombination in depletion regions.
- No surface recombination. This parameter is specially important for lateral devices and for photogate, where there is a significant amount of active carriers close to the surface. In vertical devices, the processes which determine the characteristics of the device depend only on the parameters of bulk semiconductor.
- No surface reflectance.
- No diffusion in the bulk substrate. This is important for near infra-red detectors, as most of the carrier generation happens close to the bulk substrate.

There are also some other assumptions made for each device which will be explained individually when treating each device.

In order to improve the consistency between the simulation results from the derived equations and the real measured data one can take the above parameters into account. However, the derived equations can still provide a good insight into the operation of these devices, and illustrate the effect of different parameters on the quantum efficiency. Also for almost all processes, there are no accurate data available for the physical and metallurgical parameters of the process. Therefore, it is rather unnecessary to be concerned about some of these effects. In the extreme case one can use device simulation softwares to numerically derive the device characteristics.

### 7.2.2 Quantum Efficiency of a Vertical Junction Diode

For the structure shown in Figure 7.2, the photocurrent is composed of two components: the drift current due to the drift of holes and electrons in the depletion region, and the diffusion current due to the diffusion of carriers outside the depletion region ([Moini 94]).

![Figure 7.2: The structure of a junction photodetector.](image)

- $x_j$ is the metallurgical junction depth,
- $W$ is the width of the depletion region,
- $x_{epi}$ is the thickness of the epitaxial layer.
The drift current in the depletion region is:

\[ J_{\text{drift}} = -q \int_{x_j}^{x_j+x_n} G(x) \, dx \tag{7.1} \]

where \( G(x) \) is the carrier generation rate for an incident photon flux, \( \Phi_0 \), in a semiconductor with an absorption coefficient of \( \alpha \), and is given by

\[ G(x) = \Phi_0 e^{-\alpha x} \tag{7.2} \]

Hence

\[ J_{\text{drift}} = q \Phi_0 e^{-\alpha(x_j-x_p)} (1 - e^{-\alpha W}) \tag{7.3} \]

\( x_n \) and \( x_p \) are the depletion region extents in the n and p sides of the junction and are given by

\[ x_n = \sqrt{\frac{2e(V_0+V_r)}{q} \left( \frac{N_D}{N_A(N_A+N_D)} \right)} \]
\[ x_p = \sqrt{\frac{2e(V_0+V_r)}{q} \left( \frac{N_A}{N_D(N_A+N_D)} \right)} \tag{7.4} \]

where \( V_r \) is the reverse bias voltage applied to the junction, and \( V_0 \) is the built-in potential of the junction and is equal to

\[ V_0 = \frac{kT}{q} \ln \frac{N_A N_D}{n_i^2} \tag{7.5} \]

The diffusion component of the current can be found from the diffusion equation:

\[
\begin{align*}
D_p \frac{\partial^2 p_n}{\partial x^2} - \frac{p_n-p_{n0}}{\tau_p} + G(x) &= 0 \quad \text{in the N-substrate} \\
D_n \frac{\partial^2 n_p}{\partial x^2} - \frac{n_p-n_{p0}}{\tau_n} + G(x) &= 0 \quad \text{in the p-well}
\end{align*}
\]

where \( D_n \) and \( D_p \) are the diffusion coefficients of the minority carriers, \( \tau_p \) and \( \tau_n \) are the lifetime of excess carriers, and \( p_{n0} \) and \( n_{p0} \) are the equilibrium minority carrier densities. The above equation can be solved under the boundary conditions \( p_n \big|_{x=x_{\text{epi}}} = 0 \), \( p_n \big|_{x=x_j+x_n} = 0 \), \( n_p \big|_{x=0} = n_{p0} \), and \( n_p \big|_{x=x_j-x_p} = 0 \) to obtain

\[
\begin{align*}
p_n(x) &= p_{n0} + A e^{-\frac{x_j+x_n}{L_p}} + B e^{-\frac{x_p}{L_p}} + C e^{-\alpha x} \\
n_p(x) &= n_{p0} + D e^{-\frac{x_j+x_n}{L_n}} + E e^{-\frac{x_p}{L_n}} + F e^{-\alpha x}
\end{align*}
\]

where \( L_p \) and \( L_n \) are the diffusion lengths of excess carriers, and

\[
\begin{align*}
A &= [-Ce^{-\alpha x_{\text{epi}}-p_{n0}} e^{-\frac{x_j+x_n}{L_p}} e^{-\frac{x_p}{L_p}} (p_{n0} + Ce^{-\alpha(x_j+x_n)})] \\
&\quad \times \frac{2 \sinh \left( \frac{L_p}{2} \frac{x_j+x_n}{2} \right)}{\sinh \left( \frac{L_p}{2} \frac{x_j+x_n}{2} \right)} \\
B &= [-Ce^{-\alpha x_{\text{epi}}-p_{n0}} e^{-\frac{x_p}{L_p}} e^{-\frac{x_j+x_n}{L_p}} (p_{n0} + Ce^{-\alpha(x_j+x_n)})] \\
&\quad \times \frac{2 \sinh \left( \frac{L_p}{2} \frac{x_j+x_n}{2} \right)}{\sinh \left( \frac{L_p}{2} \frac{x_j+x_n}{2} \right)} \\
C &= \frac{\Phi_0 L_p^2}{D_p (1-\alpha^2 L_p^2)} \\
D &= \frac{F \left( e^{-\alpha(x_j-x_p)} e^{-\frac{x_p}{L_n}} \right) + n_{p0}}{2 \sinh \left( \frac{x_j-x_p}{L_n} \right)} \\
E &= \frac{\Phi_0 L_n^2}{D_n (1-\alpha^2 L_n^2)} \\
F &= \frac{\Phi_0 L_n^2}{D_n (1-\alpha^2 L_n^2)}
\end{align*}
\]

\( x_{\text{epi}} \) is the p-stop extension.
The diffusion current can be found as

\[ J_{diff} = J_{diff,p} + J_{diff,n} = -qD_p \frac{\partial n_p}{\partial x} \bigg|_{x=j+x_n} + qD_n \frac{\partial n_n}{\partial x} \bigg|_{x=j-x_p} \]

\[ J_{diff} = -q \frac{D_p}{L_p} A e^{-\frac{\left(x-j+x_n\right)^2}{2 \sigma_p^2}} + q \frac{D_n}{L_n} B e^{-\frac{\left(x-j-x_p\right)^2}{2 \sigma_n^2}} + qD_p C e^{-\alpha(x_j+x_n)} + qD_n \alpha F e^{-\alpha(x_j-x_p)} \]

which can be simplified as

\[ J_{diff} = \frac{q}{L_p} \frac{D_p}{1-\cosh K_p} e^{-\frac{x_j+x_n}{\sigma_p}} + \frac{q}{L_n} \frac{D_n}{1-\cosh K_n} e^{-\frac{x_j-x_n}{\sigma_n}} + qC D_p e^{-\alpha(x_j+x_n)} \left( \alpha - \frac{\cosh K_p}{L_p \sinh K_p} \right) + qF D_n e^{-\alpha(x_j-x_n)} \left( \alpha + \frac{\cosh K_n}{L_n \sinh K_n} \right) \]

\[ (7.9) \]

The parameters \( D_n, D_p, \tau_n, \) and \( \tau_p \) can be derived from the following empirical formulas for silicon, as a function of impurity densities

\[ \tau_p = \frac{1}{7.8 \times 10^{-18} N_D + 1.8 \times 10^{-31} N_A} \]

\[ D_p = \frac{kT}{q} \left( 370 + \frac{370}{1 + 1.263 \times 10^{-15} N_D} \right) \]

\[ \tau_n = \frac{3.45 \times 10^{-12} N_A + 9.5 \times 10^{-51} N_A}{1 + 1.25 \times 10^{-15} N_A} \]

\[ D_n = \frac{kT}{q} \left( 232 + \frac{1180}{1 + 1.125 \times 10^{-15} N_A} \right) \]

\[ (7.11) \]

The total current is the summation of the drift and diffusion currents.

\[ J_{opt} = J_{drift} + J_{diff} \]

\[ (7.12) \]

The above equations can be simplified for single-sided and shallow junctions for a better understanding of the effect of different parameters on the photoresponse of the device, but we keep them in their general form. The measured absorption coefficients for silicon is shown in Figure 7.3. Typical parameters of a p-well–substrate and a diffusion–well silicon junctions are shown in Table 7.1. The simulated quantum efficiency, \( J_{opt} / \Phi_\text{opt} \), for these devices is plotted in Figure 7.4. As is seen the quantum efficiency of the diffusion-substrate junction is more than the other two structures and it also spans over a wider spectrum.

Table 7.1: Typical parameters of silicon junctions in a 2μm p-well standard process provided by Orbit Semiconductor Inc.
Figure 7.3: Measured absorption coefficient of silicon.

Figure 7.4: Simulated quantum efficiency versus wavelength for three different junction diodes in a 2μm process.
7.2.3 Quantum Efficiency of a Lateral Junction Diode

The structure of a lateral photodiode is shown in Figure 7.5. Before analyzing the devices photoresponse we make a few simplifying assumptions. We assume that only the area between the two diffusions is exposed to the light. Because otherwise, there will be a large contribution from the vertical bipolar component formed by p-diffusion/n-well/p-substrate. In reality the photogenerated electron-hole pairs will diffuse to other areas. We also assume that the effective depth of the device is only \( y_j \). Again there will be some currents diffusing through other areas.

The diffusion equations in the p+ and n-well are

\[
D_p \frac{\partial^2 p}{\partial x^2} - \frac{p_n - p_0}{\tau_p} + G(y) = 0 \quad \text{in the N-substrate}
\]
\[
D_n \frac{\partial^2 n}{\partial x^2} - \frac{n_0 - n_n}{\tau_n} = 0 \quad \text{in the p-well}
\]

By solving these equations and the boundary conditions \( p_n|_{x=x_j} = 0 \) and \( p_n|_{x=x_n} = 0 \), we will have

\[
p_n(x) = p_{n0} - \tau_p G(y) + \frac{C}{2 \sinh \left( \frac{x_j-x_n}{L_p} \right) e^{\frac{x_j}{L_p}}} - \frac{C}{2 \sinh \left( \frac{x_j-x_n}{L_n} \right) e^{\frac{x_j}{L_n}}}
\]
\[
J_{diff}(y) = -q D_p \frac{\partial p}{\partial x} = \frac{-q D_p (p_{n0} - \tau_p G(y))}{L_p \sinh \left( \frac{x_j-x_n}{L_p} \right) e^{\frac{x_j}{L_p}}} \left[ 1 - \cosh \left( \frac{x_j-x_n}{L_p} \right) \right]
\]

The drift current is simply

\[
J_{drift} = -q G(y) (x_n + x_p) \approx -G(y) x_n
\]

The total current can be obtained by integrating the addition of the drift and diffusion components across the depth and width of the device.

\[
J_{total} = \int_0^{y_j} \left[ A - (B + qx_n) G(y) \right] dy = Ay_j + (B + qx_n) \Phi_0 \left\{ e^{-gy_j} - 1 \right\}
\]

where

\[
A = \frac{-q D_p (p_{n0} - \tau_p G(y))}{L_p \sinh \left( \frac{x_j-x_n}{L_p} \right) e^{\frac{x_j}{L_p}}}
\]
\[
B = \frac{q D_p \tau_p (1 - \cosh \left( \frac{x_j-x_n}{L_p} \right))}{L_p \sinh \left( \frac{x_j-x_n}{L_p} \right) e^{\frac{x_j}{L_p}}}
\]
\[
x_n = \sqrt{\frac{2e(V_0 + V_p)}{q N_p}} \left( \frac{N_D}{N_A(N_A + N_D)} \right)
\]
\[
V_0 = kT \ln \frac{N_A N_D}{n_0^2}
\]

Figure 7.6 shows the simulation result of this structure for a typical 2\( \mu \)m process. As one may expect there is a large blue response, as all the carriers generated close to the surface are absorbed by the device. The poor response as larger wavelengths is due to the fact the we have considered the contribution of those carriers only \( y - j \) deep into the device, which is very shallow. One can combine this structure with the vertical photodiode, by exposing all sides of the diode to the light.
Figure 7.5: The structure of a lateral junction diode in an N-Well CMOS process.

Figure 7.6: Simulation result of the lateral photodiode in a 2μm CMOS process.
7.2.4 Quantum Efficiency of a Vertical Bipolar transistor

The structure of a vertical bipolar transistor is shown in Figure 7.7. We assume that only the flat area is exposed to the light. Because otherwise, there will be some contribution from the vertical walls of the emitter-base and base-collector junctions.

![Diagram of a vertical bipolar transistor](image)

Figure 7.7: The structure of a vertical bipolar detector in an N-Well CMOS process.

We can write the diffusion equation in the three regions as:

\[
\begin{align*}
D_{ne} \frac{\partial^2 n_{pe}}{\partial x^2} - \frac{n_{pe} - n_{pe0}}{\tau_{ne}} + G(x) &= 0 \quad \text{in the P-Emitter} \\
D_{pb} \frac{\partial^2 p_{nb}}{\partial x^2} - \frac{p_{nb} - p_{nb0}}{\tau_{pb}} + G(x) &= 0 \quad \text{in the N-Base} \\
D_{nc} \frac{\partial^2 n_{pc}}{\partial x^2} - \frac{n_{pc} - n_{pc0}}{\tau_{nc}} + G(x) &= 0 \quad \text{in the P-Collector}
\end{align*}
\]  

(7.19)

The boundary conditions are:

\[
\begin{align*}
& n_{pe} \big|_{x=0} = 0 \quad n_{pe} \big|_{x=x_{je}-x_{pe}} = n_{pe} \left( e^{\frac{V_{EB}}{V_T}} - 1 \right) \quad \text{in emitter} \\
& p_{nb} \big|_{x=x_{je}+x_{nc}} = p_{nb0} \left( e^{\frac{V_{EB}}{V_T}} - 1 \right) \quad p_{nb} \big|_{x=x_{je}+x_{nc}} = -p_{nb0} \quad \text{in base} \\
& n_{pe} \big|_{x=x_{epi}} = -n_{pe0} \quad n_{pe} \big|_{x=x_{epi}} = 0 \quad \text{in collector}
\end{align*}
\]  

(7.20)
The diffusion equations can easily be solved.

\[
\begin{align*}
n_{pe}(x) &= n_{pe0} + Ae^{\frac{V_{BB}}{B}} + Be^{-\frac{V_{BB}}{B}} + Ce^{-\alpha x} \\
p_{eb}(x) &= p_{eb0} + Fe^{\frac{V_{BB}}{B}} + Ge^{-\frac{V_{BB}}{B}} + He^{-\alpha x} \\
n_{pe}(x) &= n_{pe0} + Ke^{\frac{V_{BB}}{B}} + Me^{-\frac{V_{BB}}{B}} + Re^{-\alpha x} \\
A &= \frac{-n_{pe0}(e^{V_{BB}} + e^{\frac{V_{BB}}{B}} - e^{-\frac{V_{BB}}{B}} - e^{-V_{BB}})}{e^{\frac{V_{BB}}{B} + \frac{V_{BB}}{B}}} - C(e^{-\alpha x}) \\
B &= \frac{-n_{pe0}(e^{V_{BB}} + e^{\frac{V_{BB}}{B}} - e^{-\frac{V_{BB}}{B}} - e^{-V_{BB}})}{e^{\frac{V_{BB}}{B} + \frac{V_{BB}}{B}}} - C(e^{-\alpha x}) \\
Z &= \frac{x_{je} - x_{pe}}{L_c} \\
C &= \frac{\Phi_{B} L_c^2}{D_b(1 - \alpha^2 L_c^2)} \\
E &= \frac{p_{eb0}(e^{-Y} - e^{-V_{BB}})}{e^{\frac{V_{BB}}{B} + \frac{V_{BB}}{B}}} - H(e^{-\alpha x} - e^{-X + \alpha x}) \\
F &= \frac{p_{eb0}(e^{Y} - e^{V_{BB}})}{e^{\frac{V_{BB}}{B} + \frac{V_{BB}}{B}}} - H(e^{-\alpha x} - e^{-X + \alpha x}) \\
X &= \frac{x_{je} + x_{nc}}{L_b} \\
Y &= \frac{x_{je} - x_{nc}}{L_b} \\
H &= \frac{\Phi_{B} L_c^2}{D_b(1 - \alpha^2 L_c^2)} \\
M &= \frac{-n_{pe0}(2e^{W + \alpha x} - e^{X + \alpha x} - e^{-\alpha x})}{e^{\frac{V_{BB}}{B} + \frac{V_{BB}}{B} + \frac{V_{BB}}{B}} - e^{\frac{V_{BB}}{B} + \frac{V_{BB}}{B} + \frac{V_{BB}}{B}}} \\
K &= \frac{-n_{pe0}(2e^{-W - \alpha x} - e^{-X - \alpha x} - e^{\alpha x})}{e^{\frac{V_{BB}}{B} + \frac{V_{BB}}{B} + \frac{V_{BB}}{B}} - e^{\frac{V_{BB}}{B} + \frac{V_{BB}}{B} + \frac{V_{BB}}{B}}} \\
U &= \frac{x_{je} + x_{pe}}{L_c} \\
W &= \frac{x_{pe} - x_{nc}}{L_c} \\
R &= \frac{\Phi_{B} L_c^2}{D_b(1 - \alpha^2 L_c^2)}
\end{align*}
\]

The diffusion component of the emitter and collector currents can be found by

\[
\begin{align*}
J_{diff,E} &= -qD_e \frac{\partial n_{pe}(x)}{\partial x} \bigg|_{x=x_{je} - x_{pe}} + qD_e \frac{\partial p_{eb}(x)}{\partial x} \bigg|_{x=x_{je} - x_{pe}} \\
J_{diff,C} &= -qD_e \frac{\partial n_{pe}(x)}{\partial x} \bigg|_{x=x_{je} + x_{nc}} + qD_e \frac{\partial p_{eb}(x)}{\partial x} \bigg|_{x=x_{je} + x_{nc}} 
\end{align*}
\]

\[
\begin{align*}
J_{drift,E} &= \int_{depletion} -qG(x) dx \\
J_{drift,E} &= -q\Phi_0(e^{-\alpha x} - e^{-\alpha x}) \\
J_{drift,C} &= +q\Phi_0(e^{-\alpha x} - e^{-\alpha x})
\end{align*}
\]

As the base of this device is floating, the collector and emitter currents should be equal. The only variable parameter, which is unknown is V_{EE}. The value of V_{EE} for which I_C = I_E can be found using numerical methods. Figure 7.8 shows the quantum efficiency of a typical parasitic PNP transistor in a 2µm process. The large gain is simply due to the current gain of the bipolar transistor, which is larger than one. Simulation reveal that the current gain is highly dependent on the base and emitter doping densities. As one could expect the response is relatively flat over the visible spectrum. This is a result of having two junctions at two different depths in the device.
Figure 7.8: Simulated quantum efficiency of a vertical bipolar transistor in a 2μm CMOS process. Note that the quantum efficiency is greater than “1”, due to the current gain of the transistor.

### 7.2.5 Quantum Efficiency of a Lateral Bipolar Photodetector

The structure of a lateral bipolar device is shown in Figure 7.9. Before analyzing the device’s photoresponse we make a few simplifying assumptions. We assume that only the area between the emitter and collector diffusions, and the depletion regions are exposed to the light. Because otherwise, there will be a large contribution from the vertical bipolar components formed by p-diffusion/n-well/p-substrate. In reality the photogenerated electron-hole pairs will diffuse to other areas. We also assume that the effective depth of the device is only \(y_j\). Again there will be some currents diffusing through other areas.

Note that in the equations \(x\) denotes the horizontal axis and \(y\) the vertical axis. Also \(y_j\) is the depth of the collector/emitter junctions.

The diffusion equations in three regions can be written as:

\[
\begin{align*}
D_{ne} \frac{\partial^2 n_{pe}}{\partial x^2} - \frac{n_{pe} - n_{pe0}}{\tau_{ne}} &= 0 & & \text{in the P-Emitter} \\
D_{pb} \frac{\partial^2 p_{pb}}{\partial x^2} - \frac{p_{pb} - p_{pb0}}{\tau_{pb}} + G(y) &= 0 & & \text{in the N-Base} \\
D_{nc} \frac{\partial^2 n_{pc}}{\partial x^2} - \frac{n_{pc} - n_{pc0}}{\tau_{nc}} &= 0 & & \text{in the P-Collector}
\end{align*}
\]  
(7.24)

The diffusion length in the collector and emitter regions is very short. Therefore, we make another simplifying assumption that these junctions extend four times the diffusion length in these regions. We set the origin at \(4L_e\) before the start of the emitter junction, to be able to reuse the derivations for the vertical bipolar transistor.

The boundary conditions for the three regions are:

\[
\begin{align*}
 n_{pe} |_{x=0} &= 0 \\
 p_{pb} |_{x=x_{pe}+x_{nc}} &= p_{pb0}(e^{\frac{V_{EB}}{V_T}} - 1) & & n_{pe} |_{x=x_{jb}+x_{nc}} = n_{pe}(e^{\frac{V_{EB}}{V_T}} - 1) & & \text{in emitter} \\
 p_{pb} |_{x=x_{pe}+x_{nc}} &= -p_{pb0} & & p_{pb} |_{x=x_{jc}} = p_{pb0} & & \text{in base} \\
 n_{pe} |_{x=x_{jn} + x_{pc}} &= -n_{pe0} & & n_{pe} |_{x=x_{jc} + 4L_c} &= 0 & & \text{in collector}
\end{align*}
\]  
(7.25)
Figure 7.9: The structure of a lateral bipolar detector in an N-Well CMOS process.

\[ n_{pc}(x) = n_{p0} + Ae^{\frac{x}{c}} + Be^{-\frac{x}{c}} \]
\[ p_{rb}(x) = p_{r0} + G(y)\tau_{rb} + Ee^{\frac{x}{c}} + F e^{-\frac{x}{c}} \]
\[ n_{pc}(x) = n_{p0} + Ke^{\frac{x}{c}} + Me^{-\frac{x}{c}} \]
\[ A = -n_{p0}(e^{\frac{V_{EB}}{T} + \frac{y}{2} - \frac{y}{2} - \frac{1}{2}}) \]
\[ B = -n_{p0}(e^{\frac{V_{EB}}{T} + \frac{y}{2} + \frac{y}{2} - \frac{1}{2}}) \]
\[ Z = \frac{x_{jc} - x_{ne}}{L_b} \]
\[ x_{je} = 4L_c \]
\[ E = (p_{r0} + G(y)\tau_{rb})E_1 = (p_{r0} + G(y)\tau_{rb})\frac{e^{-Y\left(\frac{V_{EB}}{e} + \frac{1}{2} - \frac{1}{2}\right)}}{\frac{1}{e}} \]
\[ F = (p_{r0} + G(y)\tau_{rb})F_1 = (p_{r0} + G(y)\tau_{rb})\frac{e^{Y\left(\frac{V_{EB}}{e} + \frac{1}{2} - \frac{1}{2}\right)}}{\frac{1}{e}} \]
\[ X = \frac{x_{jc} + x_{ne}}{L_b} \]
\[ Y = \frac{x_{jc} - x_{ne}}{L_b} \]
\[ M = \frac{-n_{p0}(2W + \frac{W}{2} - \frac{W}{2})}{e^{\frac{V_{EB}}{e} + \frac{1}{2} - \frac{1}{2}}} \]
\[ K = \frac{-n_{p0}(2W + \frac{W}{2} - \frac{W}{2})}{e^{\frac{V_{EB}}{e} + \frac{1}{2} - \frac{1}{2}}} \]
\[ U = \frac{x_{jc} + x_{ne}}{L_c} \]
\[ W = \frac{x_{jc} + 4L_c}{L_c} \]

The diffusion currents at collector and emitter are:
\[ J_{def,E}(y) = -qD_e \frac{\partial n_{pc}(x)}{\partial x} \bigg|_{x=x_{je}+x_{pe}} + qD_b \frac{\partial p_{rb}(x)}{\partial x} \bigg|_{x=x_{je}+x_{pe}} \]
\[ J_{def,C}(y) = -qD_e \frac{\partial n_{pc}(x)}{\partial x} \bigg|_{x=x_{je}+x_{pe}} + qD_b \frac{\partial p_{rb}(x)}{\partial x} \bigg|_{x=x_{je}+x_{pe}} \]  
\[ J_{drif,E}(y) = -qG(y)x_{ne} \]
\[ J_{drif,C}(y) = -qG(y)x_{nc} \]
The emitter and collector currents can be obtained by integrating the corresponding drift and diffusion components of each current over the range \([y = 0 \text{ to } y = y_j]\). Notice that the current density is per unit width of the device. These should be divided by the junction depth \(y_j\) to yield a current density per unit area. The simulation result for a PNP device with minimum diffusion spacing (3\(\lambda\)) in a 2\(\mu\)m CMOS process is shown in Figure 7.10. The general shape of the quantum efficiency is very similar to that of a lateral photodiode (Figure 7.6).

\[
\begin{align*}
J_{\text{drift}, E} &= -q\Phi_{\text{diff}}(1 - e^{-\alpha y_j}) \\
J_{\text{drift}, C} &= -q\Phi_{\text{diff}}(1 - e^{-\alpha y_j}) \\
J_{\text{diff}, E} &= -q D_e \left\{ \frac{A}{L_e} e^Z - \frac{B}{L_e} e^{-Z} \right\} + q D_b \tau_{pb} \{ \frac{E_b}{L_b} e^X - \frac{F_b}{L_b} e^{-X} \} \left\{ \frac{\Phi_0}{y_j} (1 - e^{-\alpha y_j}) \right\} \\
J_{\text{diff}, C} &= -q D_e \left\{ \frac{K}{L_e} e^U - \frac{M}{L_e} e^{-U} \right\} + q D_b \tau_{pb} \{ \frac{E_b}{L_b} e^Y - \frac{F_b}{L_b} e^{-Y} \} \left\{ \frac{\Phi_0}{y_j} (1 - e^{-\alpha y_j}) \right\}
\end{align*}
\]

\[(7.29)\]

![Quantum efficiency of a lateral bipolar transistor](image)

**Figure 7.10:** Simulated quantum efficiency of a vertical bipolar transistor in a 2\(\mu\)m CMOS process.
7.2.6 Mixed structures

From the simulation results for the lateral and vertical devices obtained in the previous sections it is obvious that vertical devices have a relatively flat response over the visual spectrum, while the lateral devices have a better blue response. By combining the lateral and vertical devices new structures can be designed. In fact very minor changes are needed to be done for each device. For the photodiode structures all is needed is to make the exposure window opening large enough so that the edges of the diode are also exposed. Figure 7.11 illustrates the mixed devices.

Figure 7.11: a) A mixed lateral and vertical photodiode. b) A mixed lateral and vertical bipolar transistor.
7.2.7 Quantum Efficiency of a Photogate

The structure of a photogate is shown in Figure 7.12. A photogate is nothing but a MOS capacitor exposed to light. The principal operation of a photogate is integrating the photogenerated carriers in the depletion region, which is created by applying a large voltage to the gate. A simple assumption that we make here is that the depth of the depletion region is small. One can verify this using the following equation.

\[
x_d = -B + \sqrt{B^2 + 4AV_c}
\]

\[
A = \frac{qN_s}{2\varepsilon_0\varepsilon_{ox}}
\]

\[
B = \frac{q^2N_s^2}{2\varepsilon_0^2}\]

(7.30)

In a 2\(\mu\)m process the typical values for \(x_d\) are less than 0.5\(\mu\)m.

Therefore, it is reasonable to assume that all the charges filling the potential well are diffusing from areas outside the depletion region.

One important drawback of photogates is that they have very poor blue response because the gate material absorbs that part of the spectrum. In new processes (such as HP processes available through MOSIS) the gate is silicided, which even blocks most part of the visual spectrum. In these processes the silicide layer should be masked out from the areas above the photogate. Another solution is to make several windows in the gate so the light can pass through. Even with polysilicon gates it is recommended to use windowed gate for the photogate devices.

The spectral response of the photogate is simply obtained by solving the diffusion equation in the substrate area. Notice that a photogate works in a reset-and-integrate mode. At the reset cycle the potential well is emptied from charges. During the integration cycle diffusion of photogenerated currents fills up the potential well.

\[
J_{diff} = \frac{qD_e}{L_e}(Ae^X - Be^{-X})
\]

\[
A = \frac{-n_m0(2e^Y-e^{-Y})C(e^Y-e^{-Y})}{e^{2X}+e^{-2X}+C^2}
\]

\[
B = \frac{-n_m0(2e^{-Y}-e^Y)C(e^{-Y}-e^Y)}{e^{-2X}+e^{2X}+C^2}
\]

\[
X = \frac{x_d}{L_s}, \quad Y = \frac{x_{sub}}{L_s}, \quad C = \Phi_1\alpha L_e^2
\]

\[
D_e(1-\alpha^2L_e^2)
\]

(7.31)

\(\Phi_1\) is the photon flux at the surface of the silicon. If we assume that the gate material is polysilicon and has the same absorption coefficient as silicon, we will have:

\[
\Phi_1 = \Phi_0e^{-\alpha L_{gate}}
\]

(7.32)

The simulated spectral response of the photogate is shown in Figure 7.13. As is expected the response is exponentially dependent on the wavelength, and the device has a better response for the red part of the spectrum.
Figure 7.12: Structure of a photogate device in an N-Well CMOS process.

Figure 7.13: a) The gate transmission coefficient. b) Spectral response of the photogate.
7.2.8 The Effect of Scaling on Photodetecting Elements

As device dimensions shrink, the quantum efficiency of the devices shift to shorter wavelengths. The effect can be seen in Figure 7.14, where the simulated quantum efficiency for three junction diodes and for six different processes are shown. The shift toward shorter wavelengths for scaled processes is because as the vertical dimensions get shorter, photogenerated electron-hole pairs contributing to photocurrents are those closer to the surface. Longer wavelength photons can contribute only if the junction is deep enough. These simulated quantum efficiency curves clearly indicate that the best junction diodes still covering a good proportion of the visible spectrum with sufficient quantum efficiency, are the diffusion-substrate and well-substrate junctions.

In reality scaled processes may use different material for diffusion and gate. For example silicided diffusion and gate is commonly used in almost all sub-micron processes to obtain low resistance. Silicide blocks most parts of the visual spectrum. In order to reduce the effect of silicides, they should either be masked out in the process, or make some windows through which light can get through to the semiconductor.

![Figure 7.14: Quantum efficiency of three different vertical junction diodes for six scaled processes.](image)

7.2.9 Mismatch in Photodetecting Elements

When concerned about mismatch in vision chip circuitry, photodetectors are the last suspects. This is simply because, firstly photodetectors are generally large, as it is intended to increase the fill-factor and reduce the noise. Secondly, the processes through which photodetecting elements produce photocurrent depends mainly on the characteristics of
the bulk semiconductor, which is better controlled than surface characteristics. For example, typical standard deviation of the output current of the vertical bipolar transistor in Figure 7.1, with an area of $100\mu^2$ is less than 2%.

In photogates the well-filling process is mainly through diffusion of minority carriers in the substrate. However, the surface states at the Si-SiO$_2$ interface can also contribute to the recombination of stored carriers in the well. The higher mismatch in photogates can be associated with this process.
7.3 Photocircuits

A very important circuit in every vision chip is the front end circuitry which first receives the photocurrents. In general, the way that the input photocurrent is processed depends on the overall architecture of a specific vision chip. For example, in spatial vision chips the DC level of the inputs are important. The required photocircuits should preserve the DC level, and at the same time increase the dynamic range. For temporal vision chips the photocircuit should restore the temporal behavior of the inputs as well, while concerning the dynamic range problem.

A universal problem in image sensors is the dynamic range issue. The input light intensity varies in a large range of at least 10 decades. Human eye is capable of functioning over a 12-decade range. Obviously, adaptation mechanisms, either local or global, should come to rescue the individual circuits which at most can cope with seven decades of signal variation. The adaptation mechanisms will be covered in Section 7.7.

7.3.1 Logarithmic Sensor Using MOS Diodes

The simplest circuit for converting the photocurrent to voltage is the logarithmic conversion circuit shown in Figure 7.15. The logarithmic function is a result of the subthreshold operation of the diode connected MOS transistors. As the input photocurrent is usually very small and falls within the subthreshold region of a MOS diode, the current-voltage relationship is determined by

$$I = \begin{cases} \frac{W}{L} I_D e^{\frac{V}{V_T}} & \text{(one diode)} \\ \frac{W}{L} I_D e^{\frac{V}{V_T} \frac{1}{n^2+n}} & \text{(two diodes)} \\ \frac{W}{L} I_D e^{\frac{V}{V_T} \frac{1}{n^3+n^2+n}} & \text{(three diodes)} \end{cases}$$

(7.33)

where $W$ and $L$ are the width and length of the transistor, respectively, $I$ is the input photocurrent, $n$ is the subthreshold slope factor, $I_D$ is a process dependent parameter, and $V$ is the output voltage. This circuit has been the workhorse of many vision chips. In many designs this circuit is used because of its small size and, and large dynamic range. At low light levels, however, the circuit illustrates a very slow response, which necessitates longer settling times.

A disadvantage of this circuit is the extreme compression on the input signal. If the implemented algorithm needs to differentiate between signals, the logarithmic compression reduces the chance to detect such differences. This is specially true for motion detection chips. When the contrast of the input image is low, the logarithmic compression reduces the contrast to such low levels that in many cases only very large contrast edges can be detected.

7.3.2 Photocircuit with Buffer-like Pull-up

Compensating the capacitive load present at the input node of the photodetector is essential for enhancing the dynamic response of the photocircuit, specially when either the input current in very low or the capacitive load is large. A method that has been used in several vision chips but with different topologies is shown in Figure 7.16.

The small signal operation of the circuit can be easily understood by deriving its
One diode

Two diodes

Three diodes

Figure 7.15: A logarithmic photocircuit with two series MOS diodes. Simulation results for three cases.

transfer characteristics.

\[
\frac{v_o}{i_i} = \frac{-1}{[C_p + C_{gs}] \left[ S + \frac{g_m}{C_p + C_{gs}} \right]}
\]

Without the amplifier

\[
\frac{v_o}{i_i} = \frac{1}{C_{gs} \left[ S + \frac{g_m}{C_{gs}} \right]}
\]

With the amplifier

As is seen, the pole of the photocircuit with the amplifier in the feedback loop is transferred to a new location instead of \( g_m/C_p \). The gain is also enhanced. It should be mentioned that the pole location still depends on the input current level as we have \( g_m \propto I_i \) in subthreshold.

The DC operating point of this photocircuit is mainly determined by the amplifier. For example, if a symmetric inverter (a CMOS inverter with the transition region at Vdd/2) is used, the voltage at Vi will settle at Vdd/2. If a high gain differential amplifier is used with its positive input connected to a reference voltage, Vi will be set at that reference voltage. However, notice that the DC value of the output voltage Vo always depends on the input photocurrent, and if the reference voltage mentioned above is constant the, output voltage will have a logarithmic function versus the input photocurrent. This means that the circuit still performs as a logarithmic compressor.

### 7.3.3 Photocircuit with Amplifier-like Pull-up

Another photocircuit with enhanced dynamic behavior is shown in Figure 7.17. The small signal transfer characteristics of this photocircuit with and without the amplifier can be found as
Figure 7.16: a) Without an amplifier in the feedback loop. b) Buffering photocircuit with an amplifier in the feedback loop.

\[
\frac{\nu_o}{i_i} = \frac{-1}{C_p \left[ S + \frac{gm}{C_p} \right]} 
\]  
Without the amplifier

\[
\frac{\nu_o}{i_i} = \frac{-K}{[C_p - (K-1)C_{gd}] \left[ S + \frac{Kgm}{C_p - (K-1)C_{gd}} \right]} 
\]  
With the amplifier

7.3.4 Buffered Logarithmic Photocircuit

The simple logarithmic circuit described in section 7.3.3 can be used with a buffer having a gain equal to or less than one, instead of the amplifier (Figure 7.18-a). In this case the circuit does not show significant dynamic improvement, and in fact it shows a slight degradation (this can be easily checked by the equation derived in section 7.3.3 ). However, the output now is buffered using the source follower stage.

This circuit is ideally suited for driving large capacitive loads at the output. The advantage of the circuit shown in Figure 7.18-b over the simple buffering method in Figure 7.18-a is that the output voltage at \( V_o \) does not depend on the DC characteristics of the source follower buffer. Also the output voltage at \( V_o \) of the simple buffering method experiences a voltage drop of approximately \( V_T \), the threshold voltage of the MOS transistors.
Figure 7.17: a) Without an amplifier in the feedback loop. b) Buffering photocircuit with an amplifier in the feedback loop.

Figure 7.18: Schematic of the buffering photocircuits a) using the simple method, and b) using the buffered photocircuit.
7.3.5 Delbrück’s Adaptive Photocircuit.

One of the best photocircuits which satisfies many desired static and dynamic characteristics is the Delbrück’s adaptive photocircuit. This photocircuit which is described in great detail in [Mead 94] can adapt to steady state (or long term) light intensity variations through a logarithmic transfer function, while having a large gain for short term variations.

The circuit is principally based on the photocircuit using buffering pull-up, described in section 7.3.2. However, in the feedback loop from the output of the amplifier an adaptive element and a capacitive voltage divider has been used. The adaptation function of this photocircuit depends on the characteristics of the adaptive element shown as a shaded box in Figure 7.19. If we assume that it is just an extremely large resistor, one can easily see that at low frequencies the circuit operates similar to the buffering pull-up photocircuit. At higher frequencies the gain of the circuit is boosted by the capacitive division of C1 and C2. The circuit is therefore capable of providing high gain for short term signals which are usually of more importance than long term signals. Long term signals are logarithmically compressed.

An important component of this photocircuit is the adaptive element. Delbrück presents several structures for this element (shown in Figure 7.20) and argues, through experimental and analytical data, that the best choice is the structure in Figure 7.20-f. The choice has been made based on the leakage and offset current of these elements.

These adaptive elements, in addition to demonstrating a very large impedance at small voltages, allow a large current to pass at large voltages ( > 0.5 volt). This is useful for damping transients resulting from large input variations.

It should be noticed, however, that this circuit is useful for vision chips which require an amplified input for temporal processing. In other words, the higher gain for higher temporal frequencies will distort the original image, and therefore if such a photocircuit is used in imagers, this distortion should be taken into account. The best position for this photocircuit is in motion detection vision chips, as often in these chips there is a pressing requirement for high gain and high temporal contrast.

7.3.6 Cascoded Photocircuits

The photocircuits described in section 7.3.2 and 7.3.3 can be improved further by using a cascading transistor in the amplifier path, to reduce the effect of the miller capacitance. The new circuits are shown in Figure 7.21. It is rather very simple to show that the small signal behavior of the circuits improves. This same principle can be used in designing the amplifiers used in these circuits for reducing the capacitance between the input and output of the amplifier.

7.3.7 Current Amplifier Photocircuit

Usually the light intensity inside a normal room is below 1 lux. At these levels photocurrents are in the deep-subthreshold region ( < 1 nA). It is therefore desirable to increase the photocurrent in a similar concept as in photomultipliers. A very simple circuit which has been used in several designs is shown in Figure 7.22. The input current is injected into the base of a bipolar transistor. Hence, the current gain of the bipolar transistor (around 100) boosts the photocurrent. Note that this will amplify the signal and noise at the same time, and the signal to noise ratio remains constant. However, by increasing
Figure 7.19: The circuit diagram of the adaptive photodetector.

Figure 7.20: The circuit diagram of the adaptive photodetector.
the signal levels at the photodetector level, further circuits would not need to have a very large dynamic range.

Using the multisensitivity sensor described in section 2.17 the current gain can be boosted further by the current gain of the photodetector structure which is a bipolar transistor (See Figure 7.1). Using two MOS transistors the bipolar transistors could be switched on or off. However, the mechanisms which control the topology of the structure need more elaborate architectures which can incorporate abrupt topologic changes without other systematic misbehaviors, such as global instability. Hysteresis, for example, can be used to avoid instability due to sharp changes of the current.

Figure 7.21: Cascoded photocircuits.

Figure 7.22: Increasing the photocurrent using a bipolar transistor.
7.3.8 Integration Based Photocircuits

Almost all commercially available imagers and some of the vision chips described in this report use charge integration and sample-&-hold for transducing the photocurrent into a voltage. The basic diagram for this photocircuit is shown in Figure 7.23. Initially the reset transistor is on and the voltage at the input node $I_n$ is set to the reset value. Then this transistor is turned off and the photocurrent charges up the input capacitance at the input node. The input capacitance is usually comprised of the parasitic capacitances of the devices connected to this node. In some processes, e.g., GaAs, which have a large gate leakage current, an additional capacitor, $C_s$ may be needed to ensure enough charge storage time.

The sample and hold stage can often be removed. However, in this case after resetting the detector array and during the read-out process, some of the detectors still integrate charge while others are being read out. For simple imaging applications this does not impose serious problems, as the read out time is usually very short.

Charge integration method has several advantages and disadvantages. The advantages are its linear transfer characteristics, controllable dynamic range by changing the integration time, and low sensitivity to device mismatch at least up to the S&H stage because the integration time depends on the input capacitance which has less mismatch than other parameters of the circuit. Also the integration principally acts as a low-pass filter which removes the high frequency components of the noise (both the device noise and the digital noise).

The main disadvantages of this method are its inability to locally change the integration time, which means that the dynamic range for a specific integration time is always limited to a certain global value. However, recent progress in the design of smart sensors have made it possible to control the integration time of individual pixels at the expense of some area [Chen and Ginosar 95, Chen and Ginosar 96].

![Figure 7.23: Integration based photocircuit.](image-url)
7.4 Circuits and techniques for active pixel sensors

Active pixel sensors (APS)\(^2\) are in fact imagers in which each pixel incorporates some minimum circuitry to improve the image acquisition characteristics. CMOS based APS have been studied since 1980s by Japanese electronic industries, as an alternative for CCD cameras. Recently, the new market for multimedia cameras has created an atmosphere of struggle toward re-establishing the same concepts.

Unlike vision chips where some high level image processing is performed at the pixel level, APS only try to capture the image with a focus toward improving the image quality using standard processes. In this regard APS can be regarded as less smart sensors.

APS have extensive applications for astronomical and space exploration applications, in addition to recent interest in multimedia applications, for video and still-image imaging. APS can achieve low noise, large dynamic range, high speed, random access to pixels, and so on. Most of the circuits used in APS can well be used in vision chips, because the size and complexity of the circuits that are used in APS are less than those used for smart vision sensors. Also, the higher dynamic range, lower noise, or higher speed brought by additional circuits in APSs are as essential for vision chips. Some of techniques used for enhancing the performance of APS can also be applied to vision chips.

7.4.1 Photocircuits in active pixel sensors

The pixel circuit (photocircuit) in active pixel sensors are often based on the charge integration method (See section 7.3.8). The photodetector structure used for APS can also be based on any of the photodetector structures described in section 7.2. The read-out circuit used in APS is often a source follower based circuit described in section 7.9.1. In fact there is very little amount of new concepts in APS.

Figure 7.24 illustrates two of the common photocircuits used in APS. In the photogate-based circuit initially the node \(X\) is reset and charge starts integrating in the the potential well under the photogate created by applying a large voltage to the gate of the photogate device. After the integration cycle the charge is transferred to node \(X\) and read out. In the photodiode-based circuit after the reset operation charge is continuously integrated on the node \(X\) until the next reset. Notice that there is no Sample&Hold stage in either of these photocircuits, and in reality while the output of some of the photocircuits are being read out other photocircuits are still integrating charge. In simple imagers this does not impose any problems, as the read-out time is much shorter than the integration time.

The advantages and disadvantages of both structures are as follows.

- **Quantum efficiency**
  The quantum efficiency (QE) of a photogate is about two times less than the photodiode, because the gate usually blocks a large amount of the incoming light.

- **Fixed pattern noise**
  Fixed pattern noise (FPN) in photogate is more than photodiodes, as there are some random surface recombination processes, which are more effective in a photogate than in a photodiode.

- **Simplicity**
  A photogate requires a biasing voltage for the gate. Also a transfer gate, \(TX\), is

\(^2\)Future revisions of this section will include detailed analysis of various active pixel structures, and design principles and guidelines.
required for transferring the charge from the potential well to the read-out node.

- **Correlated double sampling**
  The reset and charge integration nodes are isolated in a photogate-based APS. Therefore, it is possible that after resetting the node \( X \), first the reset value of node \( X \) be read out and stored on a capacitor, and then after transferring the accumulated charge to node \( X \) its value be read. By subtracting the two values a large amount of FPN can be cancelled.

In the illustrated photodiode-based circuit the integration and reset nodes are the same. Therefore, it is not possible to perform the correlated double sampling operation in the same frame period. In order to add this facility to the photodiode-based circuit, a transfer gate similar to the photogate-based circuit, can be added to this circuit.

![APS photocircuit using a photogate](a) | Using a photodiode](b)
---

**Figure 7.24:** a) APS photocircuit using a photogate. b) Using a photodiode.

### 7.4.2 Correlated double sampling

Fixed pattern noise (FPN) [Fry et al. 70] is one of the main disadvantages of CMOS imagers in comparison with CCD imagers. In a CCD imager charge is transferred between neighboring CCD elements with a high charge transfer efficiency. The amount of the charge collected by a pixel in a CCD imager is also not heavily dependent on the parameters of the device. However, in CMOS imagers the charge at a pixel passes through CMOS circuits which in addition to adding some systematic nonlinearity have a high mismatch.

In a simplest method cancelling FPN can be performed by on-chip or off-chip storage of the offset values, obtained by reading the output of the photocircuits while they are reset. This method, however, requires a large amount of memory for storing the whole offset information.

Correlated double sampling (CDS) is another method in which during the pixel read-out cycle, two samples are taken. One when the pixel is still in the reset state, and one when the charge has been transferred to the read-out node. The two values are then used as differential signals in further stages, such as programmable gain amplifiers (PGA) or ADC. The CDS circuit is shown in Figure 7.25. Although CDS reduces the fixed pattern noise to a large extent, a component of the FPN due to mismatch in the CDS circuits
at each column introduces column-FPN. This noise can be reduced by using a similar concept.

It should be noted that this CDS circuit can be completely effective, only if the FPN is intensity independent, if the circuits (for example the source follower stage at each pixel) are linear, and the mismatch only has an offset component. In reality these assumptions are not true and more elaborate CDS circuits which can compensate for gain mismatch and nonlinearity in the circuits are required.

Figure 7.25: Schematic diagram of the correlated double sampling circuit. There is one such circuit for every column.
## 7.5 Spatial Processing

Image processing in the spatial domain is a very complicated task, which cannot be covered neither in this report nor in dedicated literature. Extracting spatial information from 1-D or 2-D images generally involves: regularization to constrain the ill-posed problems, applying *a priori* information to further restrict the solution space, and constrain the image or scene (for example at industrial sites). Without these simplifying assumptions, it is impossible to solve image processing problems, at least for natural scenes. This complexity of image processing is directly transferred to vision chips trying to emulate image processing tasks. While computational algorithms are progressing, most vision chips have adopted biological models, as live examples of successful vision systems. Most biological models proposed for vision have simple architectures not rivaled by computational models. However, biological models generally suffer from excessive simplification due to insufficient understanding of biological visual systems.

With this introduction I skip the basic principles for many spatial image processing algorithms and instead describe circuits and building blocks required for these algorithms.

Hardware realization of computational models, in addition to requiring complex arithmetic operations, often need involving spatial information over a large neighborhood. Interconnections are known to be a major limiting factor in realizing networks of any type and size. Therefore, computational models are not considered as VLSI friendly in this sense. Biological models, on the other hand, use simple functional blocks interacting with their nearest neighbors, features which are very attractive to VLSI implementations.

### 7.5.1 Linear Resistive networks

A large number of vision chips require processing of information within a neighborhood. Resistive networks have been known as a method of providing local interaction between cells with minimum requirement in terms of space and interconnection. Although general network theories are very helpful in understanding the type of functions realizable using resistive networks, it is in general difficult to find a resistive network suitable for a specific problem. There are a number of resistive networks that have been fully analyzed and characterized. The simplest resistive network is the smoothing network shown in Figure 7.26. The kernel of this network is an exponential function with the decaying rate depending on the ratio of the $R_1$ and $R_2$. For a spatial impulse voltage input the voltage distribution is given by [Mead 89b]

$$\frac{V_{out_n}}{V_0} = \gamma^x = e^{x \ln \gamma}$$

where

$$\gamma = 1 + \frac{R_2}{2R_1} - \sqrt{\frac{R_2}{R_1}} \sqrt{1 + \frac{R_2}{4R_1}}$$

The exponentially decaying smoothing function is not generally used in image processing algorithms, but as it is the simplest network providing spatial smoothing, it has been preferred over more complicated and area consuming networks, for example the Gaussian filtering (See section 2.7), in VLSI implementations. The errors due to device mismatch are usually prevalent and exceed the difference between an exact Gaussian function and an approximated exponential function. Therefore, it is rather unnecessary to use more accurate approximates, unless the specific algorithm heavily relies on the shape of the function, and also device mismatch can be controlled within the desired range.
A more rigorous analysis of resistive networks for early vision processing can be found, for example in [Wyatt 94].

### 7.5.2 Smoothing networks

The most intuitive network for performing spatial smoothing is the one formed using resistive networks. In such networks a resistive grid receives the input current and each node distributes its current among its neighbors. The output can be taken for example by reading the node voltages. We will see that even the most intricate circuits described in this chapter work based on this simple principle.

A simple spatial smoothing circuit, which uses the principle of current distribution into a resistive network is illustrated in Figure 7.27. If all the elements in the network have equivalent impedances, as shown in the figure, one can easily derive the equation relating the output and input currents.

\[
I_{\text{out}}(n) = \frac{1}{3}[I_{\text{in}}(n-1) + I_{\text{in}}(n) + I_{\text{in}}(n+1)] + \frac{R_y}{R_x}[I_{\text{out}}(n-2) + 2I_{\text{out}}(n-1) + 2I_{\text{out}}(n+1) + I_{\text{out}}(n+2)]
\]  
(7.38)

This is a linear recursive transfer function. It can be seen that for \(R_y \ll R_x\)

\[
I_{\text{out}}(n) = \frac{1}{3}[I_{\text{in}}(n-1) + I_{\text{in}}(n) + I_{\text{in}}(n+1)]
\]  
(7.39)

For \(R_y \gg R_x\) all the output nodes are in fact virtually shorted and

\[
I_{\text{out}}(n) = \frac{1}{N} \sum I_{\text{in}} \quad \forall \quad n
\]  
(7.40)

Implementing such a resistive network would not be economic in standard CMOS processes, because linear passive resistors with large values (to satisfy power consumption constraints) are not readily available. Also, the smoothing constant of the network would be fixed, if all the elements are fixed.

By replacing the resistive elements with translinear elements (e.g. a junction diode or a MOS diode), a more economic circuit can be realized (See figure 7.28). The expressions describing the function of the circuit can be obtained by applying the translinear principle in the loops indicated by dashed lines, and also the KCL at the input and output nodes of the circuit. It is assumed that all the elements are identical. The derivations can be

---

The Univ. of Adelaide

Principles & Building Blocks
easily extended for a network with different elements values at each branch. Here we only consider this simple case.

\[
\begin{align*}
I_{x1} \times I_{out}(1) &= I_{x12} \times I_{out}(2) \\
I_{x2} \times I_{out}(2) &= I_{x21} \times I_{out}(1) \\
I_{x2} \times I_{out}(2) &= I_{x23} \times I_{out}(3) \\
I_{x3} \times I_{out}(3) &= I_{x32} \times I_{out}(2) \\
I_{in}(2) &= I_{x21} + I_{x23} \\
I_{out}(2) &= I_{x12} + I_{x32}
\end{align*}
\] (7.41)

A generalized expression can be easily obtained.

\[
I_{out}(n) = \frac{I_{in}(n-1)}{I_{out}(n-2)} + \frac{I_{in}(n)}{I_{out}(n-1)} + 1 + \frac{I_{in}(n)}{I_{out}(n+1)} + 1 \frac{I_{in}(n+1)}{I_{out}(n+2)} + \frac{I_{in}(n+1)}{I_{out}(n+3)}
\] (7.42)

Although this is a nonlinear recursive transfer function, the network exhibits a near perfect averaging function similar to a rectangular smoothing window spreading over three neighboring inputs.

A drawback of this circuit is its fixed width of the smoothing operation. The network shown in Figure 7.29 achieves a wider smoothing by using another stage of the current distribution network. Also the middle branch of each stage can be bypassed by a MOS transistor acting as a switch. The smoothing window of this new network can be adjusted to five, three, or zero (no smoothing operation). This circuit has been used for realizing the multiplicative noise cancellation (MNC) operation in the second motion detection chip, which has been described in section 3.11.

This circuit can be modified so that the shape of the smoothing window can be adjusted by varying some bias voltages in the circuit. Figure 7.30 illustrates this new idea. Here the transconductance of the transistors is controlled by the gate voltage. The relationship between the output and input currents can be derived as

\[
I_{out}(n) = \frac{KI_{in}(n-1)}{2K + 1} + \frac{I_{in}(n)}{2K + 1} + \frac{KI_{in}(n+1)}{2K + 1}
\] (7.43)

where \( K = \exp \frac{V_{th}}{U_T} \), \( n \) is a process dependent parameter, and \( U_T = kT/q \).

By using two stages of this circuit (similar to that in Figure 7.29) an adjustable-shape smoothing window covering five neighboring cells can be obtained.
The main drawback of the smoothing circuits described so far is their fixed window size. A slightly modified version of Figure 7.30 is shown in Figure 7.31 (See [Andreou et al. 91a, Andreou and Boahen 94b, Andreou and Boahen 96]). Note that the horizontal transistors operate in the ohmic region. This circuit can be analyzed using the translinear principle by decomposing the horizontal transistors into two back-to-back transistors operating in the saturation region (See Figure 7.32). By writing the translinear equations in the loops marked by dashed lines (note that the loops end at constant voltages $V_r$ and $V_c$) and the KCL at the circuit nodes we have

\begin{align}
I_{in}(2) &= I_{out}(2) + I_{x:3} - I_{x:23} + I_{x:12} - I_{x:21} \\
I_{x:32} &= K I_{out}(2) \\
I_{x:12} &= K I_{out}(2) \\
I_{x:21} &= K I_{out}(1) \\
I_{x:23} &= K I_{out}(3)
\end{align}

where $K = e^{\frac{V_r-V_c}{nU_T}}$. From these equations we will have

$$K \Delta^2 I_{out} + I_{out} = I_{in}$$

(7.45)

$\Delta^2$ is the second spatial-derivative operator. One can easily work out that the impulse response of this network is an exponentially decaying function with a decaying rate of $1/\sqrt{K}$. One should notice that this circuit cannot be implemented using bipolar transistors, as the horizontal MOS transistors operate in the ohmic region, but all the previous circuits can be implemented using bipolar transistors without any modification to the structure of the networks.

Before ending this section I show another spatial smoothing circuit, again built using the resistive network concept. This circuit was first used for implementing a silicon retina [Mead 89b, Mead and Mahowald 88]. The schematic of the circuit is illustrated in Figure 7.33. The heart of the network is the “horizontal resistor” or HRES, which
Figure 7.29: Two-stage translinear spatial smoothing circuit.

Figure 7.30: Translinear spatial smoothing circuit with adjustable window shape.
Figure 7.31: Spatial smoothing circuit with adjustable window width.

Figure 7.32: Translinear analysis of the smoothing circuit by decomposing the horizontal transistors into two back-to-back transistors.
simulates a floating resistor. The OTA-like circuits are used to properly bias the two horizontal transistors. Assuming that some of the circuits are shared between neighboring cells, each cell uses 12 transistors. The circuit designed by Andreou and Boahen [Andreou et al. 91a, Andreou and Boahen 94b] only uses 2 transistors for each cell, a dramatic difference in the number of transistors.

Figure 7.33: Another spatial smoothing circuit, with a large number of transistors. The OTA at each input converts the input voltage to a current and injects it into the network. The other OTA-like circuit (shaded triangle) is used to properly bias the horizontal transistors.

In this section I presented a series of snapshots of how a very simple and intuitive circuit can be turned into a very useful and powerful one. Of course the evolution of the actual circuits has not followed this straight path, but certainly understanding the principles of the operation of these circuits and the slight variations that has succeeded this evolution can help in building more complex networks. Also, here I have tried to apply the translinear principle as much as possible in analyzing the circuits, and there is virtually no mentioning of the complex transistor equations.

### 7.5.3 Nonlinear Resistive networks

As in every other system, nonlinearity can introduce functional capabilities not available by linear elements. Basically all active resistors that have been used so far for implementing the resistive elements are nonlinear, although the nonlinearity has not been a
major goal in these networks. The problem with designing nonlinear resistive elements is
two-folded with respect to linear resistor networks. Firstly, a model should exist which
utilizes a nonlinear resistive element, and of course illustrates improvements. Secondly, a
proper resistive element should be found which realizes the required nonlinearity. Con-
sidering that even designing a linear resistor to operate under various condition is a great
challenge, the difficulty in designing networks using nonlinear elements becomes obvious.

There is only one form of nonlinear element that has been reported and used for vision
chips, i.e. the resistive fuse. As described in 2.15 the resistive fuse provides smoothing
only when the image contrast is small, but when the contrast is large no smoothing is
performed. Therefore, edges with large contrast which include useful information remain
safe from being smoothed out.

7.5.4 Resistive Circuits

Implementing a resistive network in VLSI can be achieved using passive or active ele-
ments. In a CMOS process there are several passive elements that can be used as resistor:
diffusion resistance, well resistance, and poly resistance\(^3\). All these elements have their
own advantages and disadvantages. However, the resistance values achievable using these
passive elements is far below what is needed in a resistive network. A simple reason
can be found from the power consumption point of view. If the resistances are small,
a large amount of power will be consumed in the resistors, in order to have sufficiently
large voltages at the nodes of the network. Another reason is that the input currents are
usually very small, which then require large resistances to produce large enough voltages
detectable at the output. For example a typical current of 10 nA requires a resistance of
100 M\(\text{Ohm}\) to produce a 1 V voltage difference.

Therefore, it is clear that active resistors are needed for implementing resistive net-
works. There are many active resistor circuits, from which a large number operate in
saturation region. Despite all efforts that has gone into designing resistors in the satu-
ration region, they are not typically useful for resistive networks, as the dynamic range
of these resistive circuits is small, and more importantly, cannot provide very large resis-
tances (> 10 Mega \(\text{Ohm}\)).

Now we will have an overview on several resistive elements that have been used in some
vision chips. The first circuit, which is known as HRES (Horizontal RESistor) [Mead 89b],
is shown in Figure 7.34. It can be shown that in subthreshold region

\[
I = I_0 e^{V_{bias} / n V_T} \tanh \frac{V_1 - V_2}{2}
\]

(7.46)

where \(I_0\) and \(n\) are process dependent parameters. In order to realize the voltage sources
represented by \(V_{bias}\), one of the circuits in Figure 7.35 can be used. Although ideally all
these circuits seem to generate a voltage drop, controllable by the biasing currents \(I_{bias}\),
there are important notes to be given on each circuit.

The first circuit, which is used in the original HRES operates properly in almost all
conditions when it is biased in subthreshold. However, if one of the inputs, say \(in1\), is
held at a low voltage, the output voltage of the associated OTA will stay at a constant
voltage because the input branch is cut off. Therefore, the \(V_{gs}\) of M1 will exceed the
desired value and a large current can pass through the resistor.

\(^3\)In recent processes the gate poly is silicided which significantly reduces the sheet resistance. For
example, a silicided poly has a sheet resistance of 2 \(\text{OHMs/Square}\), while for an amorphous silicon gate
this is about 20-50 \(\text{OHMs/Square}\).
The second circuit (Figure 7.35-b) which is a simple source follower stage has a gain less than unity. This means that for large input voltages the voltage drop increases; hence, large currents may pass through the resistor. This circuit cannot be used for subthreshold operation, and in a few vision chips has been used in saturation region.

Resistive fuses as described in section 2.15 are useful for image segmentation. Two circuits for implementing a resistive fuse were presented in the same section. Another circuit introduced by F.A. van Schaik [van Schaik 93] is a variation of the original HRES circuit, but with segmentation capabilities similar to resistive fuses, and without their local minimum problem. The main difference between this resistive circuit and HRES is that the input of the OTA is provided by a modulating voltage $V_{modulation}$, as shown in Figure 7.36.
7.5.5 CCD Circuits for Spatial Processing

CCD devices are based on handling charge packets in potential wells, generated by applying voltage to the gate of a MOS capacitor structure. There is no controllable static charge distribution mechanism in CCD to enable developing spatial processing the same as in CMOS. However, a charged packet stored in a potential well can be distributed to its neighbors through a series of clocks. In fact a major factor determining the function of CCD devices is the clocking sequence and method. Unfortunately, although it is possible to perform charge redistribution, the only operations easily feasible are addition and subtraction of charge packets. There are some vision chips using mixed CCD/CMOS circuits [Dron 93, Keast and Sodini 92]. CCDs offer easier solutions for some operations. For example, in a smoothing CCD vision chip the smoothing width can be increased by only leaving the circuit to operate over more clock cycles. In other words, CCDs are capable of iterating a function without additional demands on space.
7.6 Spatio-Temporal Processing

Spatio-temporal image processing involves an extra dimension of information in addition to spatial ones, i.e. temporal information. It is known that temporal information, usually addressed in the context of motion detection, can provide extra cues about the contents, structure, and other high or low level information present in a scene. This belief is strongly supported by experiments on species with relatively primitive visual system, but very capable of performing visual tasks. These creatures are insects. Insects heavily rely on motion detection in avoiding obstacles, landing, tracking, estimating range, and so on.

In spatio-temporal image processing the intensity values should be considered across the spatial and temporal axes. Hence, past values of image intensity should somehow be stored or delayed to be used in processing. As already mentioned through the words “store” and “delay”, two methods can be used for involving past image information, sampled-data and continuous. In sampled-data methods the image is sampled and stored in analog memory elements. In continuous method analog delay elements are used. A challenging part of designing spatio-temporal vision chips arises from the problems associated with analog memory and delay elements.

Fundamentally, an analog storage element can consist of either a capacitance holding charge or an inductor holding current. With the latter being infeasible in standard VLSI processes the first method is the only choice. Due to the leakage existing in any capacitive node, large capacitances should be used to increase the so called “charge retention time”. There are two main categories for charge storage, a DRAM style structure, and a floating gate structure. DRAM type memories can hold the charge only up to a few seconds, but floating gate devices can achieve retention times of several hundred years.

Delaying a signal in analog domain again requires a capacitive node, which can hold the information. However, in continuous delay elements charge is continuously injected to the capacitor and read out.

It is necessary to point out the fact that detecting intensity variations over time for single pixels cannot be regarded as motion detection in any sense. The term “motion” has a spatial as well as temporal component built into it.

In the following sections circuits utilized for storing and delaying charge are described.

7.6.1 Analog Memory Elements

The simplest structure for storing charge is the DRAM style cell shown in Figure 7.37. However the leakage current of the source of the switch transistor limits the charge retention time, to up to a few seconds for digital DRAM storage cells. The acceptable retention time for an analog application obviously depends on the resolution required. Considering the resolution, the acceptable retention time drops from that for DRAM cells by two orders of magnitude (for 8 bit resolution by about 1/256). This circuit is useful only for very short term storage, such as in small imagers with fast frame rates.

The storage capability of the cell can be improved by using several techniques, such as differential storage, and leakage reduction. In differential storage technique the original signal is translated into a differential signal and stored on two similar storage devices. As the leakage reduces the charge almost equally at both nodes, the difference remains the same. This method can increase the storage time by several times. A drawback of this technique is the additional area consumed by single ended-to-differential translation and the extra capacitance.
In leakage reduction techniques the leakage of the source/drain diffusion of the switching transistor at the storage node is reduced by setting the voltage across the anode and cathod of the source diffusion-well diode to zero [Vittoz et al. 91], as shown in Figure 7.38. Using this circuit storage times of up to several seconds in normal conditions can be achieved.

Floating gate structures have long been used in EPROM devices. Their storage capability for analog signals has also been used in many implementations of analog systems. Despite the long term storage achieved with floating gate structures (in the order of several years), the accuracy, programming, and reprogramming issues of these devices remain to be challenging. Floating gate devices can be found either in special processes, where thin-gate devices are available for low voltage programming, or in standard processes, where the gate of a normal transistor is left floating. The floating gate devices in standard processes require high programming voltages, which might exceed the breakdown voltages of different junctions in the process, or they may need accelerated mechanisms by exposing the chip to UV light. The accuracy achieved using floating gate devices is around 6 to 8 bits.
7.6.2 Continuous Delay Elements

In continuous spatio-temporal processing chips, delay elements are used to retain past information. Realizing ideal controllable delay elements is very difficult, if not impossible. The delay element is usually approximated by circuits, such as integrators. Figures 7.39-a and 7.39-b show two basic voltage mode circuits for delaying signals. Both circuits add distortion both in phase and amplitude to the input signal. However, this can be tolerated in many vision chips. The current mode delay element in Figure 7.39-c has been used as an essential building block in current mode circuits\(^4\).

The amount of delay in the RC network depends on the resistor value, in the OTA-C circuit on the bias current, and in the current-mode delay element on the input current level.

In order to achieve large delay times using a conventional OTA-C circuit, very small biasing currents are required. This imposes several problems, including increased mismatch at low current levels, and sensitivity to different noise sources. This requirement can be reduced by using linearization techniques applied to the OTA [Furth and Andreou 95, Moini et al. 97b].

Figure 7.39: a) a RC circuit used as a delay element, b) an OTA-C circuit as a delay element, c) a current mode delay element.

\(^4\)Dynamic current mirrors are an example of sampled data storage element. However, the principal function of these circuits is to store the gate voltage. Therefore, the storage is essentially working in voltage mode.
7.7 Adaptation

Any system which is to operate under very large dynamic range, but with its subsystems only capable of operating over a small dynamic range, should devise adaptation mechanism. In amplifier design this mechanism is known as automatic gain control (AGC). Adaptation should generally be incorporated at several levels of the system hierarchy to result in a total large operating range. For example, in a vision system, adaptation to light intensity level can be achieved by using a mechanical iris, photodetectors with adaptive sensitivity, photocircuits with adaptation capability, and adaptive processing elements. Adaptation in vision systems may also happen locally or globally, over a space or time interval. Controlling an adaptive structure requires feedback from different levels of the system hierarchy. Figure 7.40 illustrates a general light adaptive vision system. If each level of the system is capable of functioning over only three decades, a total of twelve decades of adaptation can be achieved. Overlap between adaptation regions may reduce this, but this demonstrates the capability of a hierarchical adaptation mechanism. This is true if the characteristics of each level are independent from each other.

In any case the total dynamic range of the system is limited by the first limiting front-end (usually photodetectors and photocircuits). So, if the front-end can only function over a dynamic range of 8 decades, the dynamic range of the system cannot exceed this. But, for example, a mechanical iris does not theoretically put any limitation or nonlinearity on the incoming light. Therefore, it can increase the dynamic range of the system by a large amount. However, this adaptation would be global to all the pixels in a vision chip.

In the following sections several adaptation mechanisms and circuits that can be built into vision chips are presented.

7.7.1 Light Adaptive Photodetectors

The sensitivity of a photodiode as derived in section 7.2.1 can only be changed by the applied voltage across the diode. The relationship between the output photocurrent and the voltage across the diode is \( I_{\text{photo}} \propto \sqrt{V} \). This amount of controllability is certainly not enough for practical applications. A structure that can result in several orders of magnitude of adaptation is the multisensitivity sensor presented in section 2.17, shown again in Figure 7.41. Using this structure the dynamic range can almost be tripled. For low light levels the darlington pair provides a high current gain. For mid levels only one bipolar transistor is activated, and for high light levels both bipolar transistors are inactivated.

The signal to noise ratio for all three configurations remain almost the same. Of course, the current levels at the output will be in a more manageable range for handling by subsequent circuits.

The variable sensitivity photodetector (VSPD) mentioned in section 5.1 also provides a mechanism which controls the sensitivity of the photodetector. The VSPD which is similar to the structure shown in Figure 7.1-e provides a linearly controlled sensitivity by the voltage applied across the photodetector.

7.7.2 Light Adaptive Photocircuits

As was seen in previous section light adaptation in photodetectors is rather limited. This is due to the nature of photodetectors, i.e. hard-wired structures. There is more flexibility on the design of photocircuits, as evidenced in section 7.3. Almost all structures described
Figure 7.40: Hierarchical light adaptive architecture.
in that section provide some method of adaptation. Those photocircuits with logarithmic compression already have a built-in adaptive mechanism. The logarithmic compression provides a large dynamic range of operation. The boundaries of the compression are determined by uncontrollable transistor characteristics. In the lower end it is limited by the leakage current and noise, and at the higher end by the saturation of transistors. Adaptation in these photocircuits is local. The reader is referred to the material in section 7.3 for more information on adaptive photocircuits.

In integration based photocircuits, used in many imagers, adaptation can be achieved by controlling the integration time. For low light levels long integration time can be used and for higher light levels, shorter integration time. In the lower end of light intensity the integration time is limited by the dark current, and at the higher end by the maximum clock frequency.

### 7.7.3 Light Adaptive Architectures

A higher level of adaptation can be incorporated in the architecture and algorithm of a vision chip. This adaptation scheme, which can be called algorithmic adaptation, provides a more systematic way to adaptation than an intuitive one. Algorithmic adaptation should be planned when developing the algorithm [Kobayashi et al. 95b]. In this adaptation mechanism a measure of the outputs of the system is provided, and based on that measure, the biasing conditions are controlled. Depending on the method based on which a new biasing condition is chosen, several mechanisms can be recognized.

- **Adaptation based on mean value:** This method finds the mean of variables at a level of the hierarchy and feeds back that value to control the bias. Using current mode addition, this method can easily be implemented in VLSI. The function of this method can be formulated as

\[
bias = f(\text{mean}(X_i))
\]

where \(X_i\) are the variables at a level of hierarchy.
- **Adaptation based on mean error or mean square error**: Mean error methods can provide information about the deviation of the variables at a level in the hierarchy. This can also be interpreted as an indication of the level of activity. Therefore, using this method the activity of the system can be controlled. This method is formulated as

\[
\text{bias} = f(\text{rms}(\text{mean}(X_i - \text{mean}(X_i))))
\]

or

\[
\text{bias} = f(\text{mean}(\text{abs}(X_i - \text{mean}(X_i))))
\]

(7.48)

- **Adaptation based on maximum value**: This method decides upon the largest value of variables. The hardware realization of this method is known as winner-take-all circuits. The maximum value adaptation is useful to tune the gain of the circuits so that the output of the circuits swing at full scale. This method can be described as

\[
\text{bias} = f(\text{max}(X_i))
\]

(7.49)

Choosing an adaptation mechanism depends on the statistics of the signals at the specific hierarchical level. The function \( f() \) is also dependent on the method chosen, and the circuits that are to be controlled.

### 7.7.4 Spatial Adaptation Models

Adaptation at the light level can be achieved by using spatial information and removing the redundant parts of the information. In its simplest and most intuitive form the global spatial average can be used as the common signal among all detectors. Removing the average can be performed by using subtraction or division. The dynamic range will still be limited to a global value. Removing local average instead of the global average can increase the dynamic range by several orders of magnitude. The reason being that each region in the image only adapts to its own average. There has been several models for spatial adaptation which we will review in this section. One may find similarities between the response of the systems based on these models and those from biological retinas. Some of the salient features of the spatial response of biological retina are (See for example [Nabet and Pinter 91]):

- **Dynamic range reduction**: The output signals of the retina have a much smaller dynamic range than its inputs. Therefore, further processing stages do not need to cope with large dynamic range.

- **Edge enhancement**: Edges are very important for every image processing task. By enhancing the edges further processing stages will receive more reliable inputs.

- **Intensity dependent spatial response**: At low light levels, a biological retina acts as a smoothing filter, and at high intensities it performs as a spatial band-pass filter. Also as the light level decreases the span of the spatial processing widens.

These properties have been depicted in Figure 7.42. Now let us review the models.

1. **Subtraction from local average**

In this method the local average is subtracted from the signal in each cell. Sometimes, two local averages with different spatial distribution are subtracted from each other. This method has two disadvantages. Firstly, the signal will be centered
around “zero”, and the signal variation will depend on the local average. For example, if the average current is about 1nA the signal variation will be around this value. Secondly, this method cannot reproduce the intensity dependent response. Due to its intuitive nature this method has been used in many VLSI implementations of the retina [Mead and Mahowald 88, Bair and Koch 91a, Wu and Chiu 95].

2. **Division by local average**

In this method the signal from one cell is divided by the local average. One main advantage of this method over subtraction is that the output is now centered around “one”. Therefore, the output can now be normalized to a desired value. In subtraction method, if an offset (for example 100nA) is added to move the center from “zero”, small values may get lost. Another advantage of this method is its multiplicative noise cancellation (MNC) feature. By dividing the signal to the local average, AC noise from artificial light source, which reflect from the surface of objects (and hence have a multiplicative nature), can be reduced to a small fraction. In fact this method was first introduced for this purpose [Moini et al. 95a, Moini et al. 97b]. It has been used as a pre-processor in a motion detection chip to reduce the effect of AC light sources [Moini et al. 97b, Moini et al. 95b].

This method also cannot reproduce the intensity dependent behavior.

3. **Surface reconstruction from noisy data**

This model has originally been developed as a way for reducing the noise and recovering the signal from noisy data. Using the regularization theory, one can find the solution to this problem using the biharmonic equation (See section 2.6). The VLSI implementation of this model can produce a normalized output, and demon-
strates the dynamic range improvement and edge enhancement features. Although an intensity dependence response has been reported using the implemented circuits, the change of the response is opposite the change in biological retina, i.e., at higher light levels the spatial span of the impulse response becomes wider [Boahen and Andreou 92, Andreou and Boahen 94b].

The theoretical model also does not originally account for intensity dependent characteristics, but it can be modified to include this feature too.

4. **Linear lateral inhibition**

Linear lateral inhibition is a simple form of lateral inhibition, where the signal in one cell is subtracted from fractions of the neighboring cells. This model can demonstrate the edge enhancement and dynamic range improvement features. However, it still cannot reproduce the intensity dependent behavior. This method has been used in the implementation of a few shunting inhibition vision chips [Wolpert and Micheli-Tzanakou 93].

5. **Multiplicative lateral inhibition (Shunting inhibition)**

In shunting inhibition (SI) a proportion of the output signals of each cell and its neighbors are subtracted from the signal in each channel (See sections 2.19 and 3.27). This model has in fact been developed to model the behavior of the biological retina. It has demonstrated all properties of the biological retina. Several vision chips have been designed based on shunting inhibition [Darling and Dietze 93, Moini et al. 97a].

It is now clear that many VLSI implementations of the retina have deficiencies, in the context of replicating the function of the retina; and only shunting inhibition can be regarded as the closest model for replicating the functionality of the retina.

However, in the context of providing a vision chip which can improve the dynamic range, enhance the edges, and yet be VLSI friendly the “division by average” and the model based on biharmonic equation can also be considered as viable options.
7.8 Practical issues in designing vision chips

7.8.1 Mismatch

Mismatch has been the worst limiting factor in designing analog VLSI systems, including vision chips. Mismatch can be regarded as a spatial noise spread over the surface of a vision chip. The main effects of mismatch on system performance are: dynamic range reduction due to increased spatial noise level, precision limitation, area increase, and power dissipation increase. When designing circuits all these parameters should be traded off against each other. In the absence of mismatch minimum size transistors, with minimum area and minimum capacitances could be used. As a result power dissipation could also be reduced as the loads in the circuit are decreased.

Mismatch in CMOS circuits stems from three main sources [Bastos et al. 95, Pavasovic et al. 94a, Pavasovic et al. 94b, Steyaert et al. 94, Forti and Wright 94]. The first one is the physical variation of device dimensions. For example the variation of the gate length and width in a 2µm process can be up to several 0.1µm. The only way to reduce the effect of this source of mismatch is to use large devices such that the effect of variation which often occurs at the edges of the device can be neglected.

The other source of mismatch is the metallurgical variation of device parameters, which mainly includes the variation of doping densities in the semiconductor. This type of mismatch can also be reduced by using large size transistors.

The third source of mismatch is from some electronic parameters of the device. For example, the trapped charges in the gate oxide, or the surface states in a MOS transistor can change the threshold voltage of the device.

From these sources the third one is more prevalent in MOS transistors, and it is concluded that the devices which are affected by the surface properties of the semiconductor will have more mismatch than those which mainly depend on the properties of the semiconductor away from the surface. This is in fact the main reason why BJTs and junction diodes have less mismatch than MOS devices.

Mismatch in MOS devices depends on the following parameters:

- **Transistor size**: Mismatch is inversely proportional to the area of the transistor, although both the length and width of the transistor be large enough to avoid short channel and short width effects.

- **Separation of transistors**: It is well known that for good mismatch transistors should be laid out as close together as possible. The amount of mismatch caused by transistor separation is dominated by channel area mismatch. However, for large enough transistors the the effect of distance can be observed [Bastos et al. 96].

- **Current level**: Mismatch is directly depending on the amount of current passing through transistors which are to be matched [Forti and Wright 94]. In subthreshold region the relative mismatch ($\frac{\Delta I}{I}$) is almost constant, but as the transistor enters the above-threshold region it decreases at an exponential rate.

In general finding the total mismatch of a network requires special treatment of that network, and if the circuits operate at different current levels and have different dimensions the solution would require nonlinear analysis of the network. In the special case of translinear circuits, which have been widely used in the design of many vision chips, a simplified analysis shows that mismatch in the output is proportional to $\sqrt{N}$, where $N$ is number of transistors in the circuit. MOS translinear circuits all operate in subthreshold
region, and therefore have an almost independent mismatch from current levels. However, one should notice that in subthreshold the amount of mismatch can be higher by more than one order of magnitude, and therefore subthreshold circuits should be avoided as much as possible if one is concerned about mismatch.

Circuit simulation tools, such as HSPICE, can be used to find the mismatch by applying constant inputs to the network and assuming mismatch levels for the threshold voltage and transconductance (through principle component analysis it is found that mismatch can be associated with these two parameters only) of the individual transistors in the network.

7.8.2 Digital noise

Digital noise stems from the switching transients of digital circuits. In all vision chips digital signals are present at least to scan the outputs of the array out of the chip. There may also be some digital circuits for performing on-chip processing. The effect of digital noise on analog circuits is related to the distance between the two circuits. For small distances there is a linear relationship between the distance and the amount of digital noise [Masui 92, Su et al. 93, Kerns et al. 96, Verghese et al. 96]. As the distance increases the noise remains almost constant. This has been associated with the noise coupling through the bulk substrate. There can also be some direct capacitive or resistive coupling between the switching signals and the nodes in the analog circuits. There are some techniques to partially reduce the effect of digital noise [Makie-Fukuda et al. 95, Basedau and Qiuting 95].

Most of the studies performed on modeling and characterizing digital noise has been focused on the separate analog and digital modules. However, in vision chips there is a direct coupling between at least the digital scanning signals and the analog biasing or read out lines. If the only digital circuit on the chip is the scanning circuitry, one can reduce the effect of digital noise by letting the signals to settle after every digital transition. In motion detection chips, however, this is not feasible. Motion detection vision chips operating in continuous mode are more susceptible to digital noise, as the circuits usually should operate in subthreshold and also be very sensitive to be able to detect very low temporal contrast images [Moini et al. 96]. In these circuits the scanning signals should be carefully designed to have slow transitions.
7.9 Testing vision chips

7.9.1 Design for Testing

Adopting a certain test strategy is important for a vision chip, although it is often neglected as a side-issue. This could be because of the nature of the vision chips being designed so far, as the majority have been research prototypes with emphasis on the algorithm implementation or circuit design techniques, rather than system integration and interfacing.

Some of the important issues that may be considered in relation to testing are listed below.

- **Scanning:**
  Generating scanning signals can be performed by either a decoder or a shift register. Shift register scanning is more desirable if only serial access is needed, as it only requires a few signals to operate (input, reset, and clock). Also when using a shift register, several channels can be selected simultaneously, and if the output of the cells in the vision chip are current a sum of the selected channels can be obtained at no computational cost. This method has been used by [Funatsu et al. 95a] and [Lange et al. 93] in their vision chips. Both methods are illustrated in Figure 7.43.

- **Read-out:**
  The output of the analog vision chips can be either current or voltage. In either case proper read-out circuits are required to ensure minimum distortion and fast acquisition.

  For current and voltage read-out there are many circuits available. One should make sure that the operating range of these circuits matches to the range of the output signals of the vision chip array. Specially for current read-out circuits the range may be several decades. If the current levels are very low, in the order of pico and...
nano amperes, the read-out circuits should be very sensitive and special read-out schemes should be used.

A voltage read-out circuit that is widely used in active pixel sensors consists of the input transistor of a source follower included in the photocircuit and the biasing transistor common for all the pixels. The circuit schematic is shown in Figure 7.44.

Current read-out in the simplest form can be performed using a resistive element, either passive or active. In either case as the voltage at the read-out node will depend on the current level, and the charge and discharge time also depends on the current, very slow responses would be expected for small currents, which are very common in many vision chips. A better circuit for reading current is shown in Figure 7.45-b. In this circuit the voltage at the read-out node is fixed using the Op-Amp. Hence the current does not need to charge or discharge this node. Instead of a linear resistor in the feedback loop, an element with a logarithmic characteristics (a MOS or junction diode) can be used. This is specially useful for reading currents which vary over several decades. If a capacitor is used instead of the resistor, as shown in Figure 7.45-c, a charge integration read-out is obtained. This circuit is very widely used in current read-out circuits, due to its linearity.

If the algorithm and architecture allows it it is often easier to use the charge integration technique (the same as in integration-based photocircuits) to perform a linear current-to-voltage transformation at the pixel level (See Figure 7.45-d). This will alleviate many of the requirements for current read-out circuits.

![Figure 7.44: A voltage read-out circuit using a source follower.](image)

- **On-chip Conversion:**
  Analog to digital conversion (ADC) is a necessary step for most vision chips which should interact with a digital processing unit. ADC can be performed off-chip when there is not much demand on system integration. However, when system level integration is important, for example in digital active pixel sensors, on-chip ADC becomes necessary. There are many different ADC circuits and techniques available which one can choose based on speed-precision-power-area requirements.
In addition to the type of ADC the number of ADC circuits that can be implemented on the chip can also be selected based on demand. If a parallel ADC method is used, as described in section 2.8, the speed requirement of the ADC components will be reduced by the number of columns in the array. Other alternatives in choosing the number of ADC circuits are to use only one ADC for the whole chip, or use multiple ADCs (but less than the number of columns). Torelli shows that in many respects (power, area, and resolution) there is not much gain from using any of these architectures. However, a single ADC will avoid the addition of fixed pattern noise, which can be introduced by multiple ADCs. On the other hand, the less speed demanding feature of multiple ADCs may be considered as an advantage.

- **Visualization:**
  Vision chips consist of one or two dimensional arrays of cells. The output a vision chip may or may not be visualized during its operation. In the most intuitive method outputs can be scanned, digitized and visualized on the monitor of a computer. This is particularly important for testing prototype vision chips. However, as the process of data acquisition and digital display on a monitor is usually slow, one may produce a standard video signal for the monitor, using on-chip or off-chip circuits. A scanner and circuits for this purpose has been described in [Delbrück and Mead 91a]. For more precise characterization one needs to use data acquisition techniques. Standard TV signal generation can also be incorporated as a separate module [Meitzler et al. 93, Meitzler et al. 95].

### 7.9.2 Tests and Measurements

It is very necessary for any design to be measured against standards and benchmarks related to that design. Unfortunately, such a thing has been nonexistent for vision chips so far. While, any analog or digital design is reported with several standard tests and benchmarks, none of the vision chips reported include test results in a standard format. Each vision chip has at most had a few specific test diagrams indicating the functionality of the chip under certain conditions (these conditions are often untold or vaguely described).

If vision chips are going to be used in industrial applications, and if they are to be introduced to customers, they should be represented by some standard specifications.

The problem of quantifying the characteristics of a vision chip is very challenging.
First of all computer vision algorithms, on which vision chips are based, do not have any unanimously agreed standard for testing. Determining the quality of an algorithm has remained subjective. This has been directly transmitted to vision chips. Determining the quality of the final output of vision chips is also subjective in this sense.

However, as the accuracy of image processing performed on digital computers exceeds that of vision chips, and as algorithms implemented in vision chips are relatively simple, it is quite reasonable to compare the simulation results obtained on a computer to those obtained from the vision chip itself.

The second problem in testing vision chips is due to the diversity of vision tasks. There are many different vision tasks that may be implemented in a vision chip. Defining test patterns and standards to cover all aspects of the functionality of a vision chip, and for each vision task is very difficult.

Despite these problems, there are tests that can be performed on vision chips. These tests can easily quantify the reliability, dynamic range for different operations, speed, and so on. Here I introduce a few tests which can be performed on any vision chip. Before these, another important issue, i.e. test conditions are addressed.

In any case, when reporting the functionality of any vision chip quoting absolute distance metrics would be meaningless, unless the exact characteristics of the optical system are given, as optical systems can vary from microscopes to telescopes. All parameters which should include distance should be reported using “pixel unit”. For example, for motion detection chips a sensitivity of say 0.1 foot/s is meaningless, while a value of 0.1 pixel/s clearly indicates the performance of the chip, and the system designer can adopt this value to any absolute metric velocity sensitivity with the use of proper optics.

7.9.3 Test conditions

Conditions under which a vision chip is tested are very important. A mere statement of “room condition” or “under sunlight” does not represent an engineering method of testing. More quantified measures are required.

The main function of vision chips is to acquire and process light intensity in space and/or time. Therefore, a measure of the lighting conditions should be provided at the first instance. Some simple tests and measures for this purpose are described here.

- **Source of light.** It is important to know the source of light. This can be either sunlight or moonlight for natural lighting, or fluorescent, incandescent, halogen, laser, UV, and many other types of artificial light sources. Specifying the light source is important for determining both the spectrum of the input light and additional components that might exists in a specific light source (for example AC light sources, like fluorescent and incandescent lamps, have an AC component present in them).

- **Method of exposing the chip.** Test patterns applied to the chip come either from reflected light from the patterns on Lambertian surface (for example a paper), or from transparent layers with light passing through them. In the latter, the light source should be located behind the transparency. Although both methods are acceptable, the first is closer to real situations, as almost all the objects in real environments are opaque with lambertian surfaces.

- **Spectrum of the input light.** Except for laser light sources, the spectrum of all other sources is broad. Some sources may have strong output light intensity at a wavelength, while producing light at other wavelengths as well (such as UV sources).
• **Light intensity.** Using any light source the amount of light being exposed to the chip should be known and measured. The light intensity can be measured either in *lux* or watts per unit area.

• **Temperature.** Like all other integrated circuits, the operation of vision chips is affected by temperature.

• **Optical interface characteristics.** The spatial mapping of the output world onto the surface of the vision chips is performed by the optical interface. The optical interface can range from a single lens to complicated lens structures. There are at least the following factors that are important when using an optical interface:
  - Focal length
  - Depth of Focus
  - Aperture
  - Spectrum over which the lens does not show chromatic aberration, or alternatively the best spectrum for operation

There are instruments available for measuring each of the above tests. For many light sources, the manufacturers may provide basic data, such as spectrum and spatial distribution patterns. For almost all optical interfaces the manufacturers provide extensive data.

Another test condition of great importance is the biasing conditions of the chip. Vision chips usually have several biasing parameters to tune the functionality of the chip. Obviously, a chip may not function under all conditions, and tuning might be required for different tests. If the tuning (or adaptation) is a built-in or automated function, it does not need to be expressed in test conditions. However, if manual tuning should be performed to adjust the chip for different tests, it should be left at the “best biasing conditions” for all tests. This limitation is to simulate the operation of the chip if it is used for a real application, or in other words, the reliability and total dynamic range of the chip.

### 7.9.4 Steady-state tests

These first set of tests determine the functionality of a vision chip when exposed to stationary patterns. The choice of the size and shape of patterns may be left to the designer, but there are a minimum set of patterns which can be tested without the need for a complicated setup.

Although for motion detection chips steady-state tests may seem unnecessary, under certain condition (for example very low light levels) these chips may produce outputs other than “no-motion” output.

Steady-state tests can be divided into several groups as described in the following:

• Uniform patterns at different gray levels

• Vertical, horizontal, and oblique bar patterns at different spatial frequencies and different gray levels

• Checker board pattern at different spatial frequencies and gray levels

• Radial patterns with different angular frequencies and different gray levels
• Concentric rings with different angular frequencies and different gray levels

• Sinusoidally graded images in straight line, radial, and concentric patterns.

• Lenna’s image⁵

All the patterns should be large enough to cover the whole view field of the chip. The above patterns are meaningful only to 2-D vision chips. 1-D vision chips may simply use bar patterns, and sinusoidally shaded patterns.

Some of the test patterns have been shown in Figure 7.46. These patterns can be generated by many graphical editing packages available on PC and Unix workstations. Mathematical packages with graphical capabilities can be also used, almost very easily.

It should be noticed that the final pattern on the chip depends on the characteristics of the optical interface. It is with considering the optical system that a true measurement of spatial processing capabilities can be done.

7.9.5 Spatio-temporal tests

Spatio-temporal tests involve another dimension for testing. Adding the temporal dimension to the tests is relatively simple. In fact by moving any of the patterns used for testing the steady-state functioning of vision chips in front of the chip under test, spatio-temporal responses can be obtained and measured. Step inputs are the most common stimuli applied to vision chips. The bar patterns can easily perform this test without the need for a special setup.

For some special vision chips, or for quantitative characterization of the chip, special test setups may be necessary (For example see [Meitzler 96, McQuirk 96b]).

⁵Lenna’s photo seems to be the most common benchmark image in the computer vision society.
Figure 7.46: Test patterns for steady-state testing of vision chips.
Figure 7.0: Test patterns for steady-state testing of vision chips. (continued)
Figure 7.1: Lenna’s photo.
Appendix A

Other resources

There are many resources that an interested reader can refer to. For basic analog VLSI circuits and systems Carver Mead’s book, *Analog VLSI and Neural Systems* [Mead 89b], provides an excellent *systematic* overview on the design of neuromorphic systems. There are several other published books which can be of some help, for example [Ismail and Fiez 94, Mead and Ismail 89, Sheu and Choi 95, Mahowald 94b]. There is also a paper collection from IEEE Press edited by C. Koch and H. Li [Koch and Li 94] which is an easy reference to publications on many vision chips.

Unfortunately, there is still no properly edited book on vision chips to cover various aspects in the design. Although some books on analog VLSI may contain some information, these information have come as examples for neural systems implemented in VLSI. They lack a systematic approach to the design of vision chips.

For online materials, one of the best locations to check is the WWW and FTP site at Caltech addressed at “www.pcmp.caltech.edu” and “ftp.pcmp.caltech.edu”, respectively. In particular Tobi Delbruck’s home page has many articles, reports, and his resourceful PhD thesis covering various aspects of the design of vision chips, specially on the design and characterization of photodetectors and photocircuits.

In the “html” format of this report, the home page of each vision chip has links to the home page of the designers of the chip, to postscript papers, and to other online material. The reader is highly encouraged to follow those links. Also there are links to the home pages of the groups and researchers working in this area which can be accessed in the main home page of this document in http://www.eleceng.adelaide.edu.au/Groups/GAAS/Bugeye/visionchips/”.

In the bibliography I have tried to include as many references to one work as possible. This is specially helpful when there is not access to a particular publication, but alternative sources in which the work has been presented can be accessed.
Appendix B

About this report

This report has been typeset using \LaTeX. Most figures have been drawn using Idraw, the drawing tool in InterViews 3.1 package from Stanford. A few figures requiring additional description on a postscript file have been edited using Xfig2.1p8. New figures in this revision have been drawn using Tgif3.0-10. Circuits diagrams have been drawn using Xcircuit, an electronic circuit drawing tool written by Tim Edwards from Johns Hopkins University\(^1\). Images like the Lenna’s photo have been inserted as compressed postscript outputs from XV3.10a. Some simulation results from matlab, in an encapsulated postscript format, may not be printed by some postscript printers. The final .dvi file has been translated to postscript using dvips5.58f.

This set of tools are known to generate the most compatible postscript output for printing on various postscript printers. Although I had access to other WYSIWYG desktop publishing tools, such as FrameMaker and Interleaf, I did not use them because the postscript output of these tools is not compatible with all postscript printers. Also the printed fonts from \LaTeX seem to be more visually pleasing than their counterparts from these WYSIWYG tools.

The HTML format has been directly generated from the latex files using latex2html96.1 written by Nikos Drakos from the University of Leeds. The accompanying htmlsty file, which is a style file for including html inside latex files, has also been used for the links that appear in the html format, but not in the postscript format.

The colorful figures are intended not only to make them more visually attractive, but also to help in understanding the concepts. I found it particularly hard to understand some of the figures in the original publications, not just because they were not colorful, but they were either under-described, over-simplified, or over-crowded. I believe “a picture is worth a thousand words”.

I have tried to force the description of each chip to start from a new page (using \pagebreak and \newpage commands). \LaTeX does not provide a well-placed text and figure when there are too many small sections with many figures. Starting each section from a new page has produced some extra blank space in the document, but has improved the readability and “figure-finding” features of the document.

\(^1\)This schematic editor accepts input from AnaLog (the analog VLSI simulator written by John Lazzaro from Caltech)
References


Vision Chips or Seeing Silicon


REFERENCES


REFERENCES


REFERENCES


REFERENCES


The Univ. of Adelaide

REFERENCES


REFERENCES


REFERENCES


REFERENCES


