## A Multi-Channel, 10ps Resolution, FPGA-Based TDC with 300MS/s Throughput for Open-Source PET Applications

Harmen Menninga, Claudio Favi, Matthew W. Fishburn, and Edoardo Charbon H. Menninga, M. Fishburn and E. Charbon are with the Delft University of Technology, The Netherlands C. Favi is with the École Polytechnique Fédérale de Lausanne, Switzerland

Recent research in positron emission tomography (PET) has created a demand for open-source, reconfigurable time-to-digital converters (TDCs) implemented in field-programmable gate arrays (FPGAs) [1]. When time-of-flight (TOF) techniques for image reconstruction are used, strict demands are placed on the TDC's resolution to be below 100ps. In this work we describe the architecture and characterization of an FPGA-based, 10ps resolution, 300MS/s throughput TDC targeting PET applications. The best implemented TDCs show differential non-linearity (| DNL |) below 2.0LSB, and an integral non-linearity (| INL |) below 2.5LSB. Variations in TDC performance are characterized as a function of intra-FPGA location, of different FPGAs, and temperature.

Architecture - The TDC's architecture is based on a delay line used in previous research [2], and can be directly mapped to the structure of an existing FPGA. A Virtex-6 FPGA (XC6VLX240T) consists of slices with look-up-tables (LUTs), additional logic, and carry chains normally used in fast adding. As shown in Fig. 1, the carry chain can be used as a delay line, comprising the high resolution portion of the TDC with a counter comprising the coarse portion. The measurement period of the realized architecture is one clock cycle (1.6ns). A second clock cycle is used for readout and reset, thus achieving a total throughput of 300MS/s. High clock frequencies result in shorter delay lines, and consequently in fewer clock region crossings, thus higher accuracies can be achieved.

**Static Non-Linearities** - The FPGA's clock distribution affects the delay line performance, with observable clock skew seen when crossing a clock region. A visualization of the clock distribution and its influence is shown in Fig. 2. Avoiding crossing clock regions yields better non-linearity, and correct delay line placement is essential in keeping the non-linearity low. Other static non-linearity effects are caused by the process differences across the chip. Every transistor has different properties, and therefore every position of the chip will give different results. This effect is tested by placing and measuring, one by one, 161 delay lines across the chip. The position dependence of the difference between the maximum and minimum INL is plotted in Fig. 3, with an accuracy variation from around 8LSB to less than 4LSB. Taking all the static non-linearities into account, the best possible delay line is implemented, and its characteristics are shown in Fig. 4. The chip-to-chip variation was also measured, with the results shown in Fig. 5. The mean chip-to-chip variations between identically placed delay lines in the worst case are only 0.02LSB in DNL, implying that similar results can be obtained using the different chips.

**Dynamic Non-Linearity Uniformity** - Temperature and voltage variations will affect TDC performance. Voltage levels were measured using the on-chip system monitor, and were found to be negligible. Temperature variations will change the propagation time of the delay elements, as seen in Fig. 6. Continuous trimming is required to correct the mapping from the output bin number to time — calibration only at FPGA start-up is not sufficient.

**Single-Channel Results** - After taking into account issues like clock distribution and process variation, the delay line is placed using a placement constraint and the rest of the logic is automatically mapped and routed. Fig. 4 shows a characterization results from a density test [3]. The best implemented TDC has a DNL range of [-1, 1.5]LSB, and an INL range of [-2.25, 1.61]LSB. Simulations of TDCs placed side-by-side imply that even higher throughput and accuracy can be achieved at the cost of area. Taking technology, clock distribution and PVT-variations into account during design and implementation makes high-performance FPGA-based TDCs achievable, as can be seen from the results summarized in Table 1.

**Multi-Channel Results** - Simulation and implementation setups were built to test the effect of combining multiple TDCs on-chip. Multiple TDCs implemented in a FPGA will result in a trade-off between resolution, accuracy, throughput, and area. The effect of implementing multiple TDCs close to each other was also investigated; it shows that a guard ring of one slice between the TDCs is necessary. This guard ring consists of slices inside the FPGA where no logic is placed. The simulations based on the measured characteristics of the FPGA show that the resolution can be increased to around 1ps or an accuracy to around 1LSB. Simulations also imply that multiple delay lines can also be used to increase the 300 MS/s throughput via multiplexing.

**Conclusion** - Taking into account the FPGA architecture while designing an FPGA-based TDC is essential, especially when accuracy is critical. FPGA-based TDCs can achieve sub-10ps resolution, and can be applied in a wide range of applications requiring high throughput, sub-100ps accuracy, and fast processing speed.

Acknowledgments - The authors would like to thank Xilinx, Inc. for hardware donations.

 W. W. Moses, S. Buckley, Q. P. C. Vu, N. Pavlov, W.-S. Choong, J. Wu, and C. Jackson, "OpenPET: A flexible electronics system for radiotracer imaging," 2009 *IEEE Nuclear Science Symposium Conference Record*, pp. 3491 – 3495, November 2009. [2] C. Favi and E. Charbon, "A 17 ps resolution, temperature compensated time-to-digital converter in fpga technology," *FPGA09*, vol. 1, pp. 1–8, February 2009. [3] B. K. Swann, B. J. Blalock, L. G. Clonts, D. M. Binkley, J. M. Rochelle, E. Breeding, and K. M. Baldwin, "A 100-ps time-resolution cmos time-to-digital converter for positron emission tomography imaging applications," *IEEE Journal of Solid-State Circuits*, vol. 39, pp. 1839–1852, November 2004.



Figure 1: **Delay line architecture** — with the delay element structure enlarged



Figure 2: Visualization of on-chip clock regions



Figure 3: **DNL and INL ranges** — range being the maximum value in LSB minus the minimum value in LSB

| Table 1 | Summary | table of | of best | TDC |
|---------|---------|----------|---------|-----|
|---------|---------|----------|---------|-----|

| Parameter          | Value             |  |
|--------------------|-------------------|--|
| Resolution at 25°C | 10ps              |  |
| DNL range          | [-1, 1.5] LSB     |  |
| INL range          | [-2.25, 1.61] LSB |  |
| Throughput         | 300MS/s           |  |
| Clock speed        | 600MHz            |  |
| Range              | 10ms              |  |



Figure 4: Best implemented DNL/INL



Figure 5: **Process difference between two FPGAs** — arrows indicates the TDC alignment, and in total 161 TDCs were tested



| Color | Temp.               | Res.(ps) | $\mu(V)$ | $\sigma(mV)$ |
|-------|---------------------|----------|----------|--------------|
|       | $10^{\circ}\bar{C}$ | 9.8      | 1.0096   | 2.9          |
|       | $40^{\circ}C$       | 10.22    | 1.0034   | 1.9          |
|       | $60^{\circ}C$       | 10.48    | 0.9993   | 3.2          |

Figure 6: INL temperature behavior