## Chapter 10 PROBLEMS

- 1. [C, None, 9.2] For the circuit in Figure 0.1, assume a unit delay through the Register and Logic blocks (i.e.,  $t_R = t_L = 1$ ). Assume that the registers, which are positive edge-triggered, have a set-up time  $t_S$  of 1. The delay through the multiplexer  $t_M$  equals 2  $t_R$ .
  - **a.** Determine the minimum clock period. Disregard clock skew.
  - **b.** Repeat part a, factoring in a nonzero clock skew:  $\delta = t'_{\theta} t_{\theta} = 1$ .
  - **c.** Repeat part a, factoring in a non-zero clock skew:  $\delta = t'_{\theta} t_{\theta} = 4$ .
  - d. Derive the maximum positive clock skew that can be tolerated before the circuit fails.
  - e. Derive the maximum negative clock skew that can be tolerated before the circuit fails.



Figure 0.1 Sequential circuit.

- 2. This problem examines sources of skew and jitter.
  - **a.** A balanced clock distribution scheme is shown in Figure 0.2. For each source of variation, identify if it contributes to skew or jitter. Circle your answer in Table 0.1



Figure 0.2 Sources of Skew and Jitter in Clock Distribution.

| 1) Uncertainty in the clock generation circuit | Skew | Jitter |
|------------------------------------------------|------|--------|
| 2) Process variation in devices                | Skew | Jitter |
| 3) Interconnect variation                      | Skew | Jitter |
| 4) Power Supply Noise                          | Skew | Jitter |
| 5) Data Dependent Load Capacitance             | Skew | Jitter |
| 6) Static Temperature Gradient                 | Skew | Jitter |

Table 0.1 Sources os Skew and Jitter

**b.** Consider a Gated Clock implementation where the clock to various logical modules can be individually turned off as shown in Figure 0.3. (i.e., *Enable*<sub>1</sub>,..., *Enable*<sub>N</sub> can take on dif-



Figure 0.3 Jitter in clock gating

ferent values on a cycle by cycle basis). Which approach (*A* or *B*) results in lower *jitter* at the output of the input clock driver? (hint: consider gate capacitance) Explain.

**3.** Figure 0.4 shows a latch based pipeline with two combinational logic units.



Figure 0.4 Latch Based Pipeline

Recall that the timing diagram of a combinational logic block and a latch can be drawn as follows, where the shaded region represents that the data is not ready yet.



Figure 0.5 Timing diagrams of combinational logic and latch

Assume that the contamination delay  $t_{cd}$  of the combinational logic block is zero, and the  $t_{clk-q}$  of the latch is zero too.

**a.** Assume the following timing for the input *I*. Draw the timing diagram for the signals *a*, *b*, *c*, *d* and *e*. Include the clock in your drawing.



Figure 0.6 Input timing

- **b.** State the deadline for the computation of the signal b and d, i.e. when is the latest time they can be computed, relative to the clock edges. In your diagram for (a), label with a "< >" the "slack time" that the signals b and d are ready before the latest time they must be ready.
- **c.** Hence deduce how much the clock period can be reduced for this shortened pipeline. Draw the modified timing diagram for the signals *a*, *b*, *c*, *d*, and *e*. Include the clock in your drawing.
- 4. Consider the circuit shown in Figure 0.7.



- **a.** Use SPICE to measure  $t_{max}$  and  $t_{min}$ . Use a minimum-size NAND gate and inverter. Assume no skew and a zero rise/fall time. For the registers, use the following:
- A TSPC Register.
- A C<sup>2</sup>MOS Register.
- **b.** Introduce clock skew, both positive and negative. How much skew can the circuit tolerate and still function correctly?
- c. Introduce finite rise and fall time to the clocks. Show what can occur and describe why.

5. Consider the following latch based pipeline circuit shown in Figure 0.8.

Assume that the input, *IN*, is valid (i.e., set up) 2ns before the falling edge of *CLK* and is held till the falling edge of *CLK* (there is no guarantee on the value of *IN* at other times). Determine the maximum *positive* and *negative* skew on *CLK*' for correct functionality.



6. For the L1-L2 latch based system from Figure 0.9, with two overlapping clocks derive all the necessary constraints for proper operation of the logic. The latches have setup times  $T_{SU1}$  and  $T_{SU2}$ , data-to-output delays  $T_{D-Q1}$  and  $T_{D-Q2}$ , clock-to-output delays  $T_{Clk-Q1}$  and  $T_{Clk-Q2}$ , and hold times  $T_{H1}$  and  $T_{H2}$ , respectively. Relevant clock parameters are also illustrated in Figure 0.9. The constraints should relate the logic delays, clock period, overlap time  $T_{OV}$  pulse widths *PW*1 and *PW*2 to latch parameters and skews.



## Figure 0.9 Timing constraints

- 7. For the self-timed circuit shown in Figure 0.10, make the following assumptions. The propagation through the NAND gate can be 5 nsec, 10 nsec, or 20 nsec with equal probability. The logic in the succeeding stages is such that the second stage is always ready for data from the first.
  - **a.** Calculate the average propagation delay with  $t_{hs} = 6$  nsec.
  - **b.** Calculate the average propagation delay with  $t_{hs}$ =12 nsec.



**c.** If the handshaking circuitry is replaced by a synchronous clock, what is the smallest possible clock frequency?

8. Lisa and Marcus Allen have a luxurious symphony hall date. After pulling out of their driveway, they pull up to a four-way stop sign. They pulled up to the sign at the same time as a car on the cross-street. The other car, being on the right, had the right-of-way and proceeded first. On the way they also have to stop at traffic signals. There is so much traffic on the freeway, the metering lights are on. Metering lights regulate the flow of merging traffic by allowing only one lane of traffic to proceed at a time. With all the traffic, they arrive late for the symphony and miss the beginning. The usher does not allow them to enter until after the first movement.

On this trip, Lisa and Marcus proceeded through both synchronizers and arbiters. Please list all and explain your answer.

**9.** Design a self-timed FIFO. It should be six stages deep and have a two phase handshakin with the outside world. The black-box view of the FIFO is given in Figure 0.11.



## **10.** System Design issues in self-timed logic

One of the benefits of using self-timed logic is that it delivers average-case of performance rather than the worst-case performance that must be assumed when designing synchronous circuits. In some applications where the average and worst cases differ significantly you can have significant improvements in terms of performance. Here we consider the case of ripple carry addition. In a synchronous design the ripple carry adder is assumed to have a worst case performance which means a carry-propagation chain of length N for an N-bit adder. However, as we will prove during the course of this problem the average length of the carrypropagation chain assuming uniformly distributed input values is in fact O(log N)!

- **a.** Given that  $p_n(v) = \Pr(\text{carry-chain of an n-bit addition is <math>\ge v$  bits), what is the probability that the carry chain is of length *k* for an *n*-bit addition?
- **b.** Given your answer to part (a), what is the average length of the carry chain (i.e.,  $a_n$ )? Simplify your answer as much as possible.

Now  $p_n(v)$  can be decomposed into two mutually-exclusive events, A and B. Where A represents that a carry chain of length  $\ge v$  occurs in the first *n*-1 bits, and B represents that a carry chain of length *v* ends on the *n*th bit.

c. Derive an expression for Pr(A).

- **d.** Derive an expression for Pr(B). (HINT: a carry bit i is propagated only if  $a_i \neq b_i$ , and a carry chain begins only if  $a_i = b_i = 1$ ).
- **e.** Combine your results from (c) and (d) to derive an expression for  $p_n(v) p_{n-l}(v)$  and then bound this result from above to yield an expression in terms of only the length of the carry chain (i.e., v).
- **f.** Using what you've shown thus far, derive an upper bound for the expression: n

$$\sum_{i=v} (p_i(v) - p_{i-1}(v))$$

Use this result, coupled with the fact that  $p_n(v)$  is a probability (i.e., it's bounded from above by 1), to determine a two-part upper bound for  $p_n(v)$ .

- **g.** (The magic step!) Bound *n* by a clever choice of *k* such that  $2^k \le n \le 2^{k+1}$  and exploit the fact that  $log_2 x$  is concave down on  $(0, \infty)$  to ultimately derive that  $a_n \le log_2 n$ , which concludes your proof!
- **h.** Theoretically speaking, how much faster would a self-timed 64-bit ripple carry adder be than its synchronous counterpart? (You may assume that the overhead costs of using self-timed logic are negligible).
- 11. Figure 0.12 shows a simple synchronizer. Assume that the asynchronous input switches at a rate of approximately 10 MHz and that  $t_r = 2$  nsec,  $f_{\phi} = 50$  MHz,  $V_{IH} V_{IL} = 0.5$  V, and  $V_{DD} = 2.5$  V.
  - **a.** If all NMOS devices are minimum-size, find (W/L)p required to achieve  $V_{MS} = 1.25$  V. Verify with SPICE.
  - **b.** Use SPICE to find  $\tau$  for the resulting circuit.
  - c. What waiting time T is required to achieve a MTF of 10 years?
  - **d.** Is it possible to achieve an MTF of 1000 years (where  $T > T_{\phi}$ )? If so, how?



Figure 0.12 Simple synchronizer



12. Explain how the phase-frequency comparator shown in Figure 0.13 works.

Figure 0.13 Phase-frequency comparator

- 13. The heart of any static latch is the cross-coupled structure shown in Figure 0.14 (part a).
  - **a.** Assuming identical inverters with Wp/Wn = kn'/kp', what is the metastable point of this circuit? Give an expression for the time trajectory of  $V_Q$ , assuming a small initial Vd0 centered around the metastable point of the circuit,  $V_M$ .



b. The circuit in part b has been proposed to detect metastability. How does it work? How would you generate a signal M that is high when the latch is metastable?

## Chapter 10 Problem Set

- **c.** Consider the circuit of part c. This circuit was designed in an attempt to defeat metastability in a synchronizer. Explain how the circuit works? What is the function of the delay element?
- **14.** An adjustable duty-cycle clock generator is shown in Figure 0.15. Assume the delay through the delay element matches the delay of the multiplexer.
  - a. Describe the operation of this circuit
  - b. What is the range of duty-cycles that can be achieved with this circuit.
  - **c.** Using an inverter and an additional multiplexer, show how to make this circuit cover the full range of duty cycles.



- **15.** The circuit style shown in Figure 0.17.a has been proposed by Acosta et. al. as a new self-timed logic style. This structure is known as a Switched Output Differential Structure<sup>1</sup>.
  - **a.** Describe the operation of the SODS gate in terms of its behavior during the pre-charge phase, and how a valid completion signal can be generated from its outputs.
  - **b.** What are the advantages of using this logic style in comparison to the DCVSL logic style given in the notes?
  - c. What are the disadvantages of using this style in comparison to DCVSL?
  - **d.** Figure 0.16.b shows a 2-input AND gate implemented using a SODS style. Simulate the given circuit using Hspice. Do you notice any problems? Explain the cause of any problems that you may observe and propose a fix. Re-simulate your corrected circuit and verify that you have in fact fixed the problem(s).

<sup>&</sup>lt;sup>1</sup> A.J. Acosta, M. Valencia, M.J. Bellido, J.L. Huertas, "SODS: A New CMOS Differential-type Structure," *IEEE Journal of Solid State Circuits*, vol. 30, no. 7, July 1995, pp. 835-838



Figure 0.17 a - SODS Logic Style

Figure 0.16 b - 2-input And Gate in SODS Style

16. Voltage Control Ring Oscillator.

In this problem, we will explore a voltage controlled-oscillator that is based upon John G. Maneatis' paper in Nov. 1996, entitled "Low Jitter Process-Independent DLL and PLL Based on Self-Biased Techniques," appeared in the Journal of Solid-State Circuits. We will focus on a critical component of the PLL design: the voltage-controlled ring oscillator. Figure 0.17 shows the block diagram of a voltage controlled ring oscillator:



Figure 0.17 Voltage Controlled Ring Oscillator

The control voltage, Vctl, is sent to a bias generator that generates two voltages used to properly bias each delay cell equally, so that equal delay (assuming no process variations) appear across each delay cell. The delay cells are simple, "low-gain" fully differential input and output operational amplifiers that are connected in such a way that oscillations will occur at any one of the outputs with a frequency of 1/(4\*delay). Each delay is modeled as an RC time constant; C comes from parasitic capacitances at the output nodes of the delay element,



and R comes from the variable resistor that is the load for the delay cell. Below is a circuit schematic of a typical delay cell.

Figure 0.18 One delay Cell

As mentioned before, the value of R is set by a variable resistor. How can one make a variable resistor? The object in the delay cell that is surrounded by a dotted line is called a "symmetric load," and provides the answer to a voltage-controlled variable resistor. R should be linear so that the differential structure cancels power supply noise. We will begin our analysis with the symmetric load.

**a.** In Hspice, input the circuit below and plot Vres on the X axis and Ires on the Y axis, for the following values of Vctlp: 0.5, 0.75, 1.0, 1.25, 1.5, 1.75, and 2.0 volts, by varying Vtest from Vctlp to Vdd, all on the same graph. For each curve, plot Vres from 0 volts to Vdd-Vctlp. When specifying the Hspice file, be sure to estimate area and perimeter of drains/sources.



Figure 0.19 :Symmetric Load Test Circuit

After you have plotted the data and printed it out, use a straight edge to connect the end points for each curve. What do you notice about intersection points between the line you drew over each curve, and the curves themselves? Describe any symmetries you see.

- **b.** For each Vctlp curve that you obtained in a), extract the points of symmetries (Vres, Ires), and find the slope of the line around these points of symmetry. These are the effective resistances of the resistors. Also, for each Vctl curve, state the maximum amplitude the output swing can be, without running into asymmetries. Put all of this data in an worksheet format.
- **c.** Using the estimations you made for area and perimeter of drain ad source that you put in your Hspice file, calculate the effective capacitance. (Just multiply area and perimeter by CJ and CJSW from the spice deck). Since we are placing these delay elements in a cas-

caded fashion, remember to INCLUDE THE GATE CAPACITANCE of the following stage. Each delay element is identical to one another. Now, calculate the delay in each cell, according to each setting of Vctlp that you found in a): delay=0.69\*R\*C. Then, write a general equation, in terms of R and C, for the frequency value that will appear at each delay output. Why is it necessary to cross the feedback lines for the ring oscillator in the first figure? Finally, draw a timing/transient analysis of each output node of the delay lines. How many phases of the base frequency are there?

d. Now, we will look at the bias generator. The circuit for the bias generator is as follows:



Figure 0.20 :Bias Generator

Implement this circuit in Hspice, and use the ideal voltage controlled voltage source for your amplifier. Use a value of 20 for A. This circuit automatically sets the Vctln and Vctlp voltages to the buffer delays to set the DC operating points of the delay cells such that the symmetric load is swinging reflected around its point of symmetry for a given Vctl voltage. Also, it is important to note that Vctl is the same as Vctlp. It must go through this business to obtain Vctln (which sets the bias current to the correct value, which sets the DC operating point of the buffer). Do a transient run in Hspice to verify that Vctlp is indeed very close to Vctl over a range of inputs for Vctl. Show a Spice transient simulation that goes for 1uS, and switches Vctl in a pwl waveform across a range of inputs between 0.5V and 2.0V. For extra points, explain how this circuit works.

e. Now, hook up the bias generator you just built with 4 delay cells, as shown in the first figure. For each control voltageVctlp from part c), verify your hand calculations with spice simulations. Show a spreadsheet of obtained frequencies vs. hand-calculation predictions, and in a separate column, calculate % error. Give a brief analysis of what you see. Print out all of the phases (4) of the clock, for a Vctl value of your choice.