# Analysis and Verification of Interconnected Rings as Clock Distribution Networks

Manuel Salim Maza and Mónico Linares Aranda. Instituto Nacional de Astrofísica, Optica y Electrónica, INAOE Luis Enrique Erro 1. Sta. Ma. Tonantzintla Puebla, Pue. México. Apdo. Postal 51 and 216, P.C. 7200. 011 52 (222) 2663100 ext. 1108 and 1420.

msalim@susu.inaoep.mx, mlinares@inaoep.mx

# ABSTRACT

The use of interconnected rings approach, as globally asynchronous, locally synchronous clock distribution network, offers good performance regarding scalability, low clock-skew and high-speed clocking. Moreover, they show linear metal-cost growth and the power consumption is directly proportional to number of interconnected rings. In this paper, the performance of interconnected rings, working as clock distribution networks, is analyzed and verified by experimental measurements. Typical  $3.3V 0.35\mu$ m CMOS N-well AMS process parameters were used for the analysis and chip fabrication. It is shown that interconnected rings are a robust approach under parameters variations.

# **Categories and Subject Descriptors**

B.7.1 [Integrated Circuits]: Types and Design styles – *microprocessors and microcomputers, VLSI (very large scale integration)* 

#### **General Terms**

Measurement, Performance, Design.

#### Keywords

Clock distribution networks, ring oscillators, GALS.

#### **1. INTRODUCTION**

In last years, it has been noticed a growing demand on portable equipment as cellular telephones, laptops, cameras, and audio players. It is desirable for these equipments to incorporate more functions and improve their performance in order to become more attractive to the consumers; but also keeping low power consumption that will be reflected in longer batteries' charge duration and life-time. These equipments execute the most of the processes and functions in an integrated circuit and in a digital synchronous way. This implies a clock distribution network carrying the clock signal(s) to every node requiring it. For

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

*GLSVLSI'04*, April 26-28, 2004, Boston, Massachusetts, USA. Copyright 2004 ACM 1-58113-853-9/04/0004...\$5.00.

instance, hundreds of thousands of transistor gates among all the chip, representing the largest capacitive load on-chip [1].

In many applications, Clock Distribution Networks (CDNs) drive the highest frequencies of the system, the largest interconnection lengths and consume 30 to 60% of the system power [2]-[4]. Therefore, reducing this budget, a significant power consumption reduction in the whole system is obtained. Due to the complexity of the CDN design, tools that route, insert buffers and design the net automatically are needed. These designs are not reliable at frequencies over 1GHz (even from 500 MHz) because there are variations in the expected performance due to the interconnection models used [5]. In general, the performance of the CDN will impact in great measure over the peak features of the system.

The maximum distance that a switching signal can travel across a region, in which the time of flight (TOF) does not limit the signal propagation, circumscribes a region known as the isochronous region. The size of this region decreases when the relative length, chip area and operation frequency increases [1], as shown in fig. 1. The most of the systems globally distribute an original clock signal (global nets). Nevertheless, the systems are becoming larger (MCM and SoC) and frequencies faster, then the problem of TOF is worst and nets that optimally distribute the clock signal are required [1]. New techniques have been proposed to solve this problem in clock distribution networks (CDNs): some techniques offer solutions at the process or fabrication levels such as the flipchip package [1]; or at the architecture level as asynchronous communication between blocks [7]; or the use of interconnected rings or oscillators as the (local net) CDNs [8]-[12]. The interconnected rings technique has proved to generate, lock and feed a clock signal to more than one chip in a reliable way [9] and has been used for quadrature generators and with a sleep mode [10]. Further modifications to this scheme oscillate faster but



Figure 1. Chip size and time of flight (TOF) relationship for a microprocessor with minimum dimension of 50nm and  $\varepsilon$ =1.5.

present less robustness or a reduced output swing or consume more power or area [11], [12].

The interconnected rings technique was presented in [11] as a globally asynchronous, locally synchronous (GALS) CDN for chip lengths from 4 to 24 mm, since these lengths are a trend in VLSI projects such as MCM and SoC designs [1].

In this work it is demonstrated that interconnected rings are robust under fabrication process variations and can achieve high frequency with low clock skew. Experimental results for an array of 16 interconnected rings agree with simulations realized with HSpice<sup>TM</sup>. The paper is organized as follows. In Section 2, a description of the interconnected ring topology used is presented and analyzed. In Section 3, a fabricated array of 16 non-expanded interconnected rings, their experimental measurements of frequency, skew and power consumption are presented. In Section 4, the conclusions of this work are given.

#### 2. INTERCONNECTED RINGS

Interconnected rings have been studied for 5-stages [9] and for 3stages with a triangular configuration [10]. To obtain further information about the analytical model go to [8]. These works deal with CDNs that we denominate "locals" (from GALS). We compare and verify different CDNs topologies (local and global) in order to get a methodology that consider their combinations.

We presented the topology 3 inv.  $\pm 45^{\circ}$  1:1 in [11]. The structure of 16 interconnected rings of 3 inverter stages covering 8mm x 8mm chip area is shown in fig. 2. The inverters in the array have Wn=3µm and Wp=6µm, the inverter-buffers have Wn=20µm and Wp=40µm. All of them have L=Lmin=0.35µm. The inverterbuffer isolates the rings from load: for changes from 50fF to 400fF at every sink, 5% change in the frequency of the array and 12% in the power consumption are obtained, output swing is kept from 10% to 90% of Vdd. Refer to table I for simulation results. For a better isolation or a larger load or where a larger variation is expected, larger buffers are required. This topology was analyzed for up to 144 interconnected expanded rings covering a 24mm x 24mm area and compared with other approaches in [11]. There was shown that local nets have linear metal-cost and global nets have exponential metal-cost per stage, this is a very important observation because the current trend is to increase the number of stages on the nets and the complexity of global nets also increase. By repeating the basic ring, the array will keep the basic cell properties satisfactorily, and the power consumption is proportional to the number of rings [8]-[12].

The oscillation conditions in the interconnected rings are very easily met using an odd number N of digital inverters in the ring since they have very large gain, but for other kind of inverters or oscillators the conditions and design could be very tight [11]. The frequency of the array is close to the frequency of a single cell, but not the same due to the border effects in the lattice.

Diagonal metal lines can be changed for L's and bends to leave square or rectangular spaces (as in fig. 5-a). Metal serpentines and different transistors and interconnections widths can be used to generate irregular patterns according to floor-planning



Figure 2. 16 interconnected 3-inverter stage rings at  $\pm 45^{\circ}1:1$  (one ring per sink) with buffers at each sink.

 
 TABLE I. Power consumption, frequency and time of plateau vs load on the 16-interconnected-rings array.

| C <sub>L</sub> [fF] | F [GHz] | Pot [mW] | T <sub>plat</sub> / T [%] |
|---------------------|---------|----------|---------------------------|
| 50                  | 1.15    | 117      | 20.4                      |
| 100                 | 1.19    | 127      | 15.9                      |
| 200                 | 1.2     | 127      | 7.3                       |
| 400                 | 1.14    | 132      | 0                         |

requirements, as long as those changes compensate each other and do not affect considerably the array performance.

The array of 16 interconnected rings was extracted from layout using *Virtuoso* of *Cadence*, and was simulated in *HSPICE* for the AMS 0.35 $\mu$ m technology. The 6 $\pi$ -RLC model was used for the interconnections. Power consumption, operation frequency, clock skew between sinks and ground bounce measurements were performed.

In order to probe the robustness of the array, a 30 cases Monte Carlo analysis to 16 interconnected rings was performed. Variations on  $V_t$  (23% for Nmos and 18% for Pmos transistors)



Figure 3. Merit figures for the 16-interconnectedrings array with FDs at Monte Carlo analysis.

and  $t_{ox}$  (8%) were considered. Figure 3 shows how the figures of merit of the 16 interconnected rings behave under these variations due to fabrication process; X denotes the average value, and bars denote maximum and minimal values. *vh* and *vl* stand for high and low voltage of the signal in the array. *Pow\*20* stands for power consumption of the array (including 4 frequency dividers); *F* stands for frequency of the array. *GndBnc* stands for ground bounce (and power bounce also), it was measured as a minimum (and maximum) voltage at every node in the array and FDs and it is presented relative to Vdd (3.3V). *ClkSkw* stands for Clock Skew, it was measured as the largest difference in time at the 4 inputs of the frequency dividers and it is presented relative to the period (1/F) of the array.

#### **3. EXPERIMENTAL RESULTS**

An array of 16 interconnected 3-inverter non-expanded rings was fabricated using  $0.35\mu$ m CMOS AMS technology. Due to the high speed of the rings and load of the measurement instruments, four by-16 frequency dividers (FDs) were used. The schematic and block diagram of the by-16 frequency divider (FD) are shown in fig. 4. The basic structure was selected from [13] because it is simple and it works with an input signal of up to 1.7GHz and with only 1V of input swing. Signals at quaternary nodes A, B, C and D, as depicted in fig. 2, are the inputs of four FDs. Figure 5-a depicts the 16 interconnected rings array and its 4 FDs and fig. 5-b shows the whole chip.

An output waveform obtained at the laboratory from the fabricated chip is shown in fig. 6. The values measured for the clock skew, the power consumption and the internal frequency (by 16) for two different power supply values are presented in table II. Power consumption was obtained by measuring the rms current at dedicated pins in the chip, one pin per structure and other for pads.

Experimental power consumption and frequency vs  $Vdd_{cir}$  are depicted in fig. 7. Values are normalized with the maximum



Figure 4. By-16 Frequency Divider (FD): a) Basic divider; b) By-16 FD block diagram.



Figure 5. a) Photograph of 16 interconnected rings (right) and its 4 by-16 frequency dividers (left). b) Photograph of the whole chip



Figure 6. Experimental output waveform from 16 interconnected rings at Vdd<sub>cir</sub>=2V.

presented in table II for Vdd<sub>cir</sub>=3.3V. Notice the high linearity of power consumption and the response of frequency vs Vdd. For variations of power near 3.3V, the sensitivity of frequency is lower than at 2V, this is desirable because designs usually will run at the highest speed possible. Finally, a comparison of the simulated and

measured frequency and power consumption of the array (including FDs) is presented in table III.  $f_{int}$  stands for frequency of the array inside the chip and  $P_{16R4Div}$  stands for power consumption of the 16 interconnected rings array and its 4 frequency dividers. Experimentally measured frequency was 6% higher and power consumption was 28% lower than expected from simulation, this is very encouraging for this kind of CDN.

| TABLE II. Measured Internal frequency, close | ck skew |
|----------------------------------------------|---------|
| and power consumption on the 16-interconn    | ected-  |
| rings array.                                 |         |



Figure 7. Measured power consumption and frequency vs Vdd for the 16-interconnected-rings array. Values are normalized

 
 TABLE III. Simulated vs measured internal frequency and power consumption on the 16-interconnected-rings array.

|              | f <sub>int</sub> [GHz] | P <sub>16R4Div</sub> [mW] |
|--------------|------------------------|---------------------------|
| Simulation   | 1.29                   | 65.1                      |
| Experimental | 1.37                   | 50.7                      |

# 4. CONCLUSIONS

Trend to use larger systems leads to more complex global nets. Interconnected rings have been proposed and verified for being used as GALS clock distribution networks.

In this work it has been shown that interconnected rings are robust under fabrication process variations and present a power consumption proportional to number of rings. Experimental measurements obtained at the laboratory agree satisfactorily with simulations. Low clock skew and a oscillation frequency of 1.37GHZ with a power consumption of 50.7mW (including the frequency dividers) were obtained.

# 5. ACKNOWLEDGMENTS

This work was partially supported by Consejo Nacional de Ciencia y Tecnología (CONACYT-MEXICO) under grant no. 34557-A and scholarship 129236

#### 6. **REFERENCES**

- Dennis Sylvester and Kurt Keutzer, "Impact of Small Process Geometries on Microarchitectures in System on a Chip", Proceedings of the IEEE, Vol. 89, No. 4, April 2001. pp. 467-489.
- [2] Jatuchai Pangjun and Sachin S. Sapatnekar. "Low-Power Clock Distribution Using Multiple Voltages and Reduced Swings". IEEE Trans. on VLSI Systems, Vol. 10, No. 3, June 2002, pp. 309-318.
- [3] J. Montanaro, R. T. Witek, et. al., "A 160-Mhz, 32-b 0.5-W CMOS RISC Microprocessor", IEEE JSSC, Vol. 31, No. 11, pp. 1703-1714, Nov. 1996.
- [4] Ferd E. Anderson et al. "The Core Clock System on the Next Generation Itanium Microprocesor". IEEE Proc. of ISSCC 2002 S8.5.
- [5] William H. Kao, Chi-Yuan Lo, Mark Basel and Raminderpal Singh, "Parasitic Extraction: Current State of the Art and Future Trends", Proceedings of the IEEE, Vol. 89, No. 5, May 2001, pp. 729-739.
- [6] Eby G. Friedman, "Clock Distribution Networks in Synchronous Digital Integrated Circuits", Proceedings of the IEEE, Vol. 89, No. 5, May 2001, pp. 665-692.
- [7] Thomas Meincke et al. "Globally asynchronous locally synchronous architecture for large high-performance ASICs", Proceedings of the IEEE ISCAS, May 30-June 2, 1999, Orlando, Florida, pp. 512-515.
- [8] L. Hall, M. Clements, W. Liu, and G. Bilbro, "Clock Distribution using cooperative ring oscillators", in *Proc.* 17<sup>th</sup> *Conf. Advanced Research in VLSI*, Sept. 1997, pp. 15-16.
- [9] Lars Bengtsson and Bertil Svensson, "A Globally Asynchronous, Locally Synchronous SIMD Processor", Proc. of MPCS'98: Third International Conference on Massively Parallel Computing Systems, Colorado Springs, Colorado, USA, April 2-5, 1998.
- [10] S. Hwang and Gyu Moon. "A Ultra High Speed Clock Distribution Technique Using a Cellular Oscillator Network". Proc. of IEEE ISCAS 2000, May 28-31, Geneva, Switzerland. pp. 589-592.
- [11] M. Salim Maza and M. Linares Aranda. "Interconnected Rings And Oscillators As Gigahertz Clock Distribution Nets". IEEE ACM Great Lakes Symposium on VLSI 2003, 28-29 April, Washington D.C., U.S.A. S2-6S.
- [12] F. O'Mahony, C. Patrick Yue, Mark A. Horowitz and S. Simon Wong, "Design of a 10GHz Clock Distribution Network Using Coupled Stading-Wave Oscillators". *IEEE DAC 2003*, June 2-6, Anaheim, California, USA. S40.1 pp. 682-687
- [13] V. Stojanovic and V. G. Oklobdzija. "Comparative Analysis of Master-Slave Latches and Flip-Flops for High-Performance and Low-Power Systems". IEEE JSSC, Vol. 34, No. 4, April 1999.