# A Parallel 32x32 Time-To-Digital Converter Array Fabricated in a 130 nm Imaging CMOS Technology

M. Gersbach<sup>1,5</sup>, Y. Maruyama<sup>5</sup>, E. Labonne<sup>1</sup>, J. Richardson<sup>2,3</sup>, R. Walker<sup>3</sup>, L. Grant<sup>2</sup>, R Henderson<sup>3</sup>, F. Borghetti<sup>4</sup>, D. Stoppa<sup>4</sup>, and E. Charbon<sup>1,5</sup>

<sup>1</sup>Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland <sup>2</sup>ST Microelectronics Imaging Division, Edinburgh, Scotland <sup>3</sup>University of Edinburgh, Edinburgh, Scotland <sup>4</sup>Fondazione Bruno Kessler, Trento, Italy <sup>5</sup>TU Delft, Delft, The Netherlands

Abstract— We report on the design and characterization of a 32 x 32 time-to-digital converter (TDC) array implemented in a 130 nm imaging CMOS technology. The 10-bit TDCs exhibit a timing resolution of 119 ps with a timing uniformity across the entire array of less than 2 LSBs. The differential- and integral non-linearity (DNL and INL) were measured at  $\pm$  0.4 and  $\pm$ 1.2 LSBs respectively. The TDC array was fabricated with a pitch of 50µm in both directions and with a total TDC area of less than 2000µm<sup>2</sup>. The characteristics of the array make it an excellent candidate for in-pixel TDC in time-resolved imagers for applications such as 3-D imaging and fluorescence lifetime imaging microscopy (FLIM).

(Keywords: TDC, TDC array, TCSPC, SPAD, FLIM)

#### I. INTRODUCTION

Time-resolved imaging has been a rapidly growing field of investigation in recent years as it offers several advantages over traditional intensity imaging. In machine vision for example, time-resolved imaging allows to construct the depthmap of a scene. In life sciences time-resolved imaging has enabled the emergence of fluorescence lifetime imaging, a quantitative imaging method to locally probe the chemical environment of a fluorophore in living cells.

So far, most time-correlated single-photon counting (TCSPC) setups were based on a single detector, often a single photon avalanche diode (SPAD) [1] or a photomultiplying tube (PMT), an external chronometer, often a time-to-digital converter (TDC)[2],[3],[4], and an optical scanner to reconstruct an image. The integration of SPADs into CMOS technology [5] has significantly improved the level of miniaturization of SPADs and thus paved the way for large SPAD arrays [6]. Early implementations of SPAD arrays however did not integrate TDCs on chip and thus only one pixel was active at a time [6],[7],[8]. Later, larger SPAD arrays (128 x 128) were integrated with up to 32 parallel TDCs allowing an entire row of pixels to be active simultaneously [9]. However, in order to acquire fast images over the entire array it is necessary to integrate many more

This work has been supported by the European Community within the Sixth Framework Programme IST FET Open.

Disclaimer: This publication reflects only the authors' views. The European Community is not liable for any use that may be made of the information contained herein. TDCs on chip, possibly one per detector, so as to enable independent and simultaneous acquisition and time discrimination at the pixel basis. To achieve this, very compact TDCs must be designed. However, due to the complexity of these devices, only a deep-submicron implementation may be a viable alternative.

In this paper we report on the design and characterization of an array of 1024 TDCs implemented in a 130nm imaging CMOS process. Each TDC was coupled to a low noise SPAD based on [11] and [12]. To the best of our knowledge, this is one of the largest arrays of fully integrated TDCs ever built. The array consists of 1024 TDCs, independently and simultaneously operating at 500MS/s. The current readout speed is limited by the external hardware, while in principle 1MS/s could be reached. Each TDC/SPAD ensemble measures only 50x50µm<sup>2</sup>. It is thus one of the smallest ever demonstrated with deep sub-nanosecond resolution.

## II. DESIGN

A TDC takes two inputs: a START and a STOP signal and computes the time elapsed between them. Each 10-bit TDC consists of a two-level (coarse and fine) interpolator activated by the digital pulse from a SPAD upon photon detection (START signal). The coarse interpolator consists of a 6-bit ripple counter clocked by an on-chip PLL.



Figure 1. Simplified block diagram of the TDC. The START signal of the TDC may be generated globally by an external pulse or locally via a SPAD.

The PLL clock of 280 MHz is distributed across the entire pixel array and its frequency is doubled to 560 MHz on pixel.

Thus, a coarse resolution of 1.79 ns is achieved. The fine interpolator further divides each clock cycle into 16 periods by sending the START signal through a delay chain consisting of 16 buffers. In order to minimize jitter due to supply noise on the buffer propagation delay, a differential buffer architecture was chosen. The propagation delay of the buffers can be adjusted by tuning the gate-voltage of the NMOS transistors controlling the tail current. Thanks to a calibration loop (Cal. signal) the delay chain is tuned to match the clock period.

On the first clock edge after the START signal the propagation of the pulse through the delay chain is stopped, thus the number of buffer elements that toggled between the photon detection and the subsequent clock edge corresponds to the time elapsed between these two events. A coder converts the resulting thermometer code of the fine interpolator output into a 4-bit binary number. Finally, the STOP signal interrupts the coarse interpolator (ripple counter). The coded total elapsed time is the combination of the output of the coarse and fine interpolators resulting in a 10-bit code. Figure 2 shows the conversion scheme pictorially.



Figure 2. Simplified TDC conversion scheme.

Note that the STOP signal, given by the laser's reference signal in most TCSPC applications, is used as input clock for the PLL and thus the rising edge of the STOP signal always falls on a rising edge of the distributed clock. The configuration in which the photon detection event is used as START signal and the laser's reference signal as STOP signal is known as "reverse start-stop configuration". The main advantage of this mode of operation is that the TDC does not start if no photon has been detected, thus saving considerable power as most TCSPC applications, such as FLIM, involve low illumination intensities. However, the reverse start-stop configuration can only be used with pulsed lasers having a very stable repetition rate as the STOP signal is taken from the laser's synchronization signal for the subsequent laser pulse.

After acquisition of the timing data, the code is stored within a 10-bit memory placed inside the pixel until it can be read out of the array. A rolling-shutter type readout allows reading 500'000 10-bit TDC results out of the chip per second and per pixel. The array is divided in two 16x32 sub-arrays and each half-column of 16 pixels shares a data bus consisting of ten parallel lines. A Y-decoder activates all the rows consecutively and the data in the memory of each pixel in the half-column is sent through the data bus when activated. The data bus of each half-column is connected to a pad via a serialiser. Thus, a total of 64 IO pads, each operating at 80 MHz, are used to read out a total of 5.12 Gbit/s of data per second. A 500 kHz signal is used to reset all the TDCs in the array simultaneously.

Figure 3 shows a block diagram of the chip. The system comprises an array of 32x32 pixels, each with a TDC and a SPAD, an on-chip PLL, and an I<sup>2</sup>C block to program all modes of operation for all the components of the sensor. A photomicrograph of the system is shown in Figure 4.



Figure 3. Schematic of the rolling shutter type readout. Each TDC produces a 10-bit code 500'000 times per second. If no photon has been detected and thus the TDC has not been active, a specific code is read out.



Figure 4. Photomicrograph of the chip containing the 32x32 TDC and SPAD ensemble array, a PLL, a  $l^2$ C, and readout electronics.

#### **III. CHARACTERIZATION**

Each TDC in the array is coupled to a SPAD, thus the TDC's were tested using a pulsed laser source. The laser emits short (<40 ps) light pulses at a frequency of 40 MHz (Advanced Laser Diode Systems GmbH, Germany). The START signal of each TDC is triggered by a photon detected by the corresponding SPAD while the STOP is given by the laser's electrical synchronization pulse. Thus, the time-of-flight (TOF) of the laser pulse can be measured and the accuracy of the TDC can be assessed by changing the laser-chip distance. The timing accuracy at FWHM of the entire system comprising laser (~40ps of timing jitter), SPAD (~144ps [11]) and TDC is ~238ps (or ~2 LSB's). Assuming that the jitter is a random process resulting from statistically independent sources (SPAD, TDC, and laser), its standard deviation can be estimated as

$$\sqrt{238^2 - 144^2 - 40^2} = 185 \, ps$$

The readout breadboard used for testing the TDC array is based on a dual VirtexII<sup>™</sup> FPGA system. The top panel of Figure 5 shows the measured time delay for a laser-chip distance of up to 3 meters with the theoretical delay shown as reference value while the bottom panel shows the typical histogram obtained when repeating the measurement for one specific distance.



Figure 5 Top: TOF measurements and theoretical delay for different laserchip distances. Bottom: A typical histogram for repeated TOF measurements with a constant delay.

A histogram of photon arrival times under uncorrelated light was acquired in order to assess the DNL and INL. From the count density distribution histogram, the DNL was obtained by dividing the result of each bin of the histogram with the average result over all the bins in the histogram. Then, the INL was calculated by integrating the DNL. The DNL and INL were measured in a range of  $\pm 0.4$  LSB and  $\pm 1.2$ LSB respectively, with a bin width of 119 ps. The results are shown in Figure 6.

The TDC uniformity was measured by shining a pulsed laser onto the chip at a constant distance. Figure 7 shows the resulting map of time delays for each pixel on the top panel. The bottom panel (with zoom inset) shows a statistics of time measurements. Over 80% of the TDCs exhibit an error of less than 2LSBs.



Figure 6. Measured DNL and INL with a timing resolution (bin width) of 119 ps.



Figure 7. TDC output uniformity across the array for a TOF of 13.2ns. No significant change was observed for other time delays.

Our TDCs are highly scalable. The suitability of the TDC for larger arrays was verified measuring the current dissipation trends with increasing numbers of active TDCs as shown in Figure 8. A globally generated clock was preferred over a local pixel-level clock to reduce the power dissipation in the pixel to a minimum, at a cost of increased off-pixel power dissipation. Furthermore, since the TDC is eventdriven (triggered by the SPAD), the power consumption is lower for low illumination levels, whereas the noise of the detector, due to local heating, should also be minimal. When few pixels are enabled, the power consumption is dominated by PLL and clock distribution. When several hundred pixels are enabled, they begin to dominate power consumption. From Figure 8 we can conclude that the PLL and clock distribution consume ~42 mA of current while the entire chip at maximum activity consumes ~78 mA of current. Table 1 summarizes the characterization results.



Figure 8. Core current consumption (without I/O) at 1.2V supply for two photon detection rates.

| Performance         | Typical | Max.  | Unit |
|---------------------|---------|-------|------|
| Time resolution     | 119     | 111   | ps   |
| Bit resolution      | 10      |       | bits |
| Time jitter @ FWHM  | 185     |       | ps   |
| DNL                 |         | ± 0.4 | LSB  |
| INL                 |         | ± 1.2 | LSB  |
| Uniformity          | 2       |       | LSB  |
| Current Consumption | 78      |       | mA   |
| Range               | 100     |       | ns   |
| Measurement rate    | 500     | 1000  | kHz  |

TABLE I. PERFORMANCE SUMMARY OF THE TDC ARRAY AT ROOM TEMPERATURE.

## IV. CONCLUSION

A 32x32 TDC array suited for time-resolved imaging applications such as 3-D imaging or fluorescence lifetime imaging was fabricated and tested. A time resolution of 119ps was achieved with an accuracy of ~2LSBs and excellent

uniformity across the array. DNL and INL were measured at  $\pm 0.4$  and  $\pm 1.2$  LSBs respectively. Finally, the in-pixel power consumption was limited by event-driven operation and the use of an out-of-pixel clock generation to avoid local heating that could deteriorate the detector noise performance.

### V. REFERENCES

- S. Cova, A. Longoni, A. Andreoni, R. Cubeddu, "A semiconductor detector for measuring ultraweak fluorescence decays with 70ps FWHM resolution", IEEE J. of Quantum Electron., vol. 10 (4), pp. 630-634, 1983.
- [2] P. Dudek, S. Sczepanski and J. V. Hatfield, "A high-resolution CMOS time-to-digital converter utilizing a Vernier delay line", IEEE J. Solid-State Circuits, vol. 35, pp. 240-247, 2000.
- [3] B. K. Swann *et al.* "A 100-ps time-resolution CMOS time-todigital converter for positron emission tomography imaging applications", IEEE J. Solid-State Circuits, vol.39, no.11, pp.1839-1852, 2004
- [4] J.-P. Jansson, A. Mäntyniemi and J. Kostamovaara, "A CMOS time-to-digital converter with better than 10 ps single-shot precision", IEEE J. Solid-State Circuits, vol. 41, no.6, pp. 1286-1296, 2006
- [5] A. Rochas *et al.*, "Single photon detector fabricated in a complementary metal-oxide-semiconductor high-voltage technology", Rev. of Sci. Instr., vol. 74 (7), pp. 3263-3270, 2003.
- [6] C. Niclass, A. Rochas, P.A. Besse, E. Charbon, "Design and characterization of a CMOS 3-D image sensor based on single photon avalanche diodes" IEEE J. of Solid-State Circuits, vol. 40, pp. 1847-1854, 2005.
- [7] D. Stoppa *et al.*, "A CMOS 3-D Imager based on Single Photon Avalanche Diode", IEEE Transactions on Circuits and Systems I, vol. 54, No. 1, 2007.
- [8] F. Zappa, S. Tisa, A. Gulinatti, A. Gallivanoni, S. Cova, "Monolithic CMOS detector module for photon counting and picosecond timing," Proc. ESSDERC, pp. 341–344, 2004.
- [9] C. Niclass, C. Favi, T. Kluter, M. Gersbach, E. Charbon, "A 128 x 128 single-photon image sensor with column-level 10-bit timeto-digital converter array", IEEE J. of Solid- State Circuits, vol. 43 pp. 2977-2989, 2008.
- [10] M.A. Marwick and A.G. Andreou, "Single photon avalanche photodetector with integrated quenching fabricated in TSMC 0.18µm CMOS process", Electronics Letters, vol. 44, no.10, 2008
- [11] C. Niclass, M. Gersbach, R. Henderson, L.Grant, E. Charbon, "A single photon avalanche diode implementation in 130-nm CMOS technology", IEEE J. of Sel. Top. in Quantum Electron., vol. 13, pp. 863-869, 2007.
- [12] M. Gersbach *et al.*, "A low-noise single photon detector implemented in a 130nm CMOS process", Solid-State Electronics, to appear.
- [13] M. Cohen *et al.*, "Fully optimized Cu based process with dedicated cavity etch for 1.75µm and 1.45µm pixel pitch CMOS image sensors", IEDM, 2006.