# ANALYSIS OF TIMING FAILURES DUE TO RANDOM AC DEFECTS In visi modules

Nandakumar N. Tendolkar

IBM Corporation Data Systems Division Poughkeepsie, N.Y. 12602

# <u>ABSTRACT</u>

This paper presents an analytical model for projecting the yield loss due to random delay defects for modules or VLSI packages containing multiple semiconductor chips. A module to be analyzed is characterized by distribution of path delays. Statistical analysis is applied to obtain the distribution of delays caused by defects in logic circuits of LSI chips. The model uses these two distributions to calculate the probability that a module contains a path that dons not meet the system timing requirements. All inputs to the model can be obtained much earlier than the availability of modules for actual testing. Therefore expected module yield loss due to delay defects can be projected before the modules are actually manufactured.

### I. INTRODUCTION

In high-performance large computer systems[1] statistical techniques are used to determine the cycle time (which is the maximum time allowed for propagation of signals from latch to latch). The statistical techniques take into consideration the normal variation in delays of logic circuits and wiring. Random manufacturing defects such as transistor emitter-collector shorts, open Schottky diodes, contact opens, etc. cause the logic circuits to switch at speeds slower than normal. Faults caused by such defects are called AC or delay faults[2,3]. An AC defect causes an increase in the delay of the paths going through the circuit. We consider a system that is packaged using several multi-chip modules. Each module contains several LSI chips. A module can fail in a system environment if it contains a system path whose delay exceeds the cycle time. Whether a module failure occurs will depend on the size of the defect (in nano seconds) and the delays of the paths affected by the defect. The problem we are investigating is that, given that the LSI chips that make up the module have a certain probability of having AC defects in them, what is the probability that the module would fail in the system? From a manufacturing point of view , we would like to ensure that no more than a given fraction (a small percentage) of modules fail due to AC defects. Rσ monitoring the manufacturing process, we can ensure that the probability that a logic chip contains an AC defect is low and that the magnitude of a delay fault is small (a few nanoseconds or less). However, this does not guarantee that the probability of AC failure of a module that contains, say, 100 chips would be low. To be able to control module AC failures under the assumptions stated above, we develop a mathematical model to calculate the probability that a module would fail to meet system timing requirements due to an AC defect. This model is based upon chip level information on AC defects and statistical characterization of the paths on the module. Using the model, we can project the percent of modules that would fail to meet system timing requirements before making a single module. This is important because analysis of AC problems in multi-chip VLSI modules is complex. Without the model one may have to manufacture and test a large number of modules to determine how many fail the test. If a significant number of modules are found defective, we need to determine whether the failures are due to random AC defects in a chip or some other type of defects on the chips or the substrate. Therefore the cost of detecting module AC failure problems by module testing is prohibitive. The model described in the paper provides a cost-effective and convenient way of analyzing the expected module fallout due to AC defects.

In what follows, the probability that a module fails in the system due to an AC defect is denoted by ACQL. In section II, we define the concept of system sensitivity and give an equation for ACQL in terms of system sensitivity, the number of circuits on the module, and the probability that a circuit contains an AC defect. We describe the concepts of distribution of path delays on a module in section III, and the AC defect size distribution function in section IV. These concepts are crucial to understanding the mathematical model for calculating system sensitivity, which is presented in section V. In section VI, we study a typical multi-chip module to show how the model is used in the analysis of ACQL. We also study the effects of defect delay size and the module path delay distribution on system sensitivity and ACQL

22nd Design Automation Conference

### II. SYSTEM SENSITIVITY

For a given circuit i on a module we define system sensitivity , S(i), as the average probability that when there is a random AC defect in logic circuit i on the module, it causes the module to fail. In defining the system sensitivity for a circuit, the average is taken over all possible defect sizes, and the rest of the circuits on the module are assumed defect free. We shall consider the computation of system sensitivity in section V. Here we present how we can express the module ACQL in terms of system sensitivity S(i). Let N be the number of logic circuits on the module. We assume that each logic circuit is equally likely to have an AC defect. The probability that a logic circuit contains an AC defect is p. The value of p is generally low and the chances of two AC defects in a single logic path are therefore low. We assume that if multiple AC defects occur on a module, they are on distinct system paths. From the definition of system sensitivity of circuit i, it follows that the probability that module AC failure occurs due to an AC defect in circuit i is p S(i).

Let E(i) be the event that circuit i is either AC defect free or contains a defect that does not result in module AC failure. P[ E(i) ], the probability of E(i), is given by

P[E(i)] = 1 - p S(i)

Now, the probability that the module AC failure does not occur, P(good), is same as the probability that each of the N independent events E(i) occur. Therefore,

$$P(good) = \pi (1 - p S(i))$$
$$i=1$$

But, from the definition of ACQL,

ACQL = 1 - P(good)

Therefore,

$$ACQL = 1 - \pi (1 - p S(i))$$
(1)  
 $i=1$ 

Before we can present the mathematical model for calculating S(i), we describe the concepts of distribution of path delays on a module (in section III), and describe the AC defect size distribution (in section IV). These concepts are crucial to understanding the mathematical model. A simple example of interconnection of logic blocks on a module is shown in Figure 1 which serves as the basis for the discussion. A block represents a logic circuit. The number inside the block is the block delay. Thus, for example, the time taken by the signal to get from PI1 to PO in Figure 1 is 4. There are two PIs and one PO. The blocks through which the signals pass in going from a PI to a PO define a path. The delay of the first path (which goes through blocks AA, BA, and CA) is 9 and the delay of the second path ( which goes through blocks AB, BB, and CA) is 5.



Figure 1. Example of a Logic Network

Consider a set of modules that are logically identical and AC-defect-free. Because of variations in the manufacturing process, the actual delay of a given logic block varies from module to module. Paths on such modules are characterized[1] by computing the mean and variance of the path delay, and the path delay of a given path is a random variable with normal distribution. Timing analysis programs[1] identify the longest path through a block and provide the following information on every block.

Mean delay (m(i)) for the path.
 Variance of the path delay distribution.

The mean delay, m(i), varies from block to block. In large multi-chip modules the variance of path delay is approximately constant. Therefore, in our analysis we use a single number z to denote this variance.

To characterize a module we use timing analysis to determine z and m(i) for each circuit i. Let w(x) be the number of blocks i which have m(i) equal to x. w(x)/ N is the probability that a randomly chosen circuit will have m(i) = x. A convenient way to show the variation of m(i) on a module is to plot the distribution of m(i). An example of this distribution is shown in Figure 2. The Y value in Figure 2 is w(x). The function w(x) will be used in section V.



Figure 2. Distribution of mean path delay

#### IV. DISTRIBUTION OF DEFECT SIZE

The size of an AC defect is the additional delay in the switching time of a circuit due to the defect. Methods of measuring the circuits to determine the distribution of the size of AC defects in logic circuits have been discussed in References 3,4 and 5. They use in-line process monitoring to determine the probability of an AC defect and the size of a defect. Data are collected over a large number of circuits and the distribution is obtained. An example of a typical defect size distribution is shown in Figure 3. The defect size distribution gives f(d), which is the conditional probability that given there is an AC defect in a circuit, the size of the defect is d.



Figure 3. Distribution of size of ac defects

# V. MATHEMATICAL MODEL FOR CALCULATING SYSTEM SENSITIVITY

Consider a circuit i for which the mean path delay is m(i) and the variance of path delay is z. Suppose an AC defect of size d affects the circuit. The effect of the defect is to change the mean path delay from m(i) to m(i) + d for a path passing through the circuit. So the delay of this path in any particular module is a random variable, X(d), which has a normal distribution with mean equal to m(i) + d and variance equal to z. Let system cycle time be C. Those modules for which X(d) exceeds C would fail because they have a path whose delay the cycle time. exceeds Therefore, the probability that the module fails when a defect of size d affects circuit i, Q(i,d), is

$$Q(i,d) = Prob. [X(d) > C]$$
(2)

From the definition of S(i),

$$S(i) = \sum_{d} f(d) Q(i,d)$$
(3)

X(d) is a random normal variable with mean m(i)+d and variance z and hence, the right-hand side of Equation 2 can be evaluated for specific values of m(i), d and z by using statistical tables[6] or computer programs[7]. f(d) can be obtained as discussed in section IV, and we can use Equation 3 to get S(i).

Suppose the defect distribution and cycle time are fixed and we want to evaluate system sensitivity. For a fixed value of f(d), z and C, S(i) depends only on m(i). Therefore, all circuits that have the same m(i) would have the same S(i). We now describe how this fact can be used to simplify Equation 1 and the computation of system sensitivity.

We can group circuits into classes, where all circuits in a class have the same m(i)value. Let R(x) be the set of values of i for which m(i) = x. One calculation is required to determine S(i) for all circuits i in set R(x). Let SS(x) be the system sensitivity of each circuit i in R(x). From section III,  $\omega(x)$  is the number of circuits for which m(i) is equal to x.

Since | R(x) | = w(x), we get

$$w(x)$$
  
 $\pi$  (1 - p 5(i)) = (1 - p 55(x)) (4)  
iER(x)

Using Equation 4, we can rewrite Equation 1 as

ACQL = 1 - 
$$\pi$$
 (1- p S5(x)) (5)  
xED

where D is the set of distinct values of m(i). Equation 5 can be simplified by observing that when p is small higher powers of p can be neglected, and we get

$$ACOL = \sum_{x \in D} p SS(x) \omega(x)$$
(6)

We define the average system sensitivity of a module, S', by the following equation

$$S' = \sum_{\mathbf{x} \in \mathbf{D}} SS(\mathbf{x}) \, \omega(\mathbf{x}) \neq \mathbf{N}$$
(7)

Using Equation 7, we can rewrite Equation 6 as

$$ACQL = pNS' (8)$$

Therefore, to compute ACQL, we first compute SS(x) for each distinct value x and then either use Equation 5 or if p is small use Equations 7 and 8.

The mathematical model gives a quantitative relationship between various design and manufacturing parameters and ACOL. From Equation 8 we know that ACQL is a linear function of p, N, S'. Suppose one factor is changed and the remaining factors in Equation 8 are held constant. We can project the effect of the factor on ACQL. For example, if defect probability doubles, ACQL will double; if the number of circuits on a module double, ACQL will double. S' is a function of the defect size distribution, the mean path delay distribution and the cycle time. We reduce ACQL by reducing S' or p. Various alternative ways of reducing ACQL can be evaluated using Equation 8. In the next section we show some numerical examples of how the model can be used to analyze ACQL.

### VI. EXAMPLES OF ACQL ANALYSIS

An APL package is available for ACQL analysis using the mathematical model developed above. Input to the APL program are the cycle time, the defect size distribution and the path delay distribution. We now consider a numerical example to show the output of the APL model.

The module to be analyzed is characterized by the distribution of mean path delay shown in Figure 2. There are 16,800 logic circuits on the module. The cycle time is 22 nanoseconds. The standard deviation of mean path delay is 1.5 nanoseconds. The distribution of the size of AC defects is shown in Figure 3. The probability that a circuit contains an AC defect, p, is .0001.

To calculate ACQL, we first calculate the system sensitivity. For each distinct value of mean path delay, SS(i) is calculated and shown



Figure 4. System Sensitivity vs. Mean Path Delay

in Figure 4. The chart shows that as the mean path delay increases the system sensitivity increases exponentially. Paths that have mean delay of 17 ns are 4 times more prone to AC failures than paths that have 16 ns mean delay, when we hold all other factors constant. We shall discuss the implications of this observation later.

Using Equation 7, we get S', the average system sensitivity, as 0.00178. This says that on the average 1.78 module AC failures occur for every 1000 AC defects. From Equation 8 we get,

ACQL = .003

or 3 out of every 1000 modules would fail due to AC defects.

We next consider the set of modules that fail due to AC defects and ask the question, what percent of them were due to a given size defect? The answer is shown in Figure 5.



Figure 5. Module failure breakdown by defect size

Approximately 55% of module AC failures were due to 4.2 ns defects. But, only 1% of defects are 4.2 ns defects (see Figure 3.). In contrast, approximately 5% of module AC failures are due to 0.2 ns defects, and 49% of defects are 0.2 ns defects. This observation tells us that the larger the size of the defect, the more likely it will result in an AC failure. To further understand the relationship between defect size and the likelihood that it results in a module failure, we calculate the probability that a module will fail given that it contains a single defect of a given size. This probability is shown in Figure 6. For small defects, up to 5 nanoseconds, we see that the probability that a single defect causes a module failure increases exponentially with defect size. Thus we can get a 4 times reduction in ACQL by lowering the defect size from 3 to 2 nanoseconds, for example.

When a module fails due to an AC defect, we want to determine the failing path. On a VLSI module there are hundreds of thousands of paths. Each has a different mean delay value. The designers would like to know what effect the mean path delay has on the likelihood of an AC failure. It may be easy to make design changes on some paths but not on others. The term w(x) SS(x) in Equation 6 is proportional to the likelihood of an AC failure due to a path whose mean delay is x ns. The conditional probability that if a module fails the failure is in a path with mean delay x is given by :

 $\omega(x) SS(x) / \Sigma \omega(x) SS(x) x \in D$ 

This probability is shown in Figure 7. We see that 55 % of module failures are due to failures in path with mean delay of 17 ns. We also note that the longer the mean path delay, the more likely that the path would fail due to an AC defect.

We next consider the effect of variation in the variance of the path delay. Due to changes in material or the semiconductor manufacturing process, the variance of the path delay can change. In Figure 8, we compare three different values of the variance of the path delay. We see that the system sensitivity (and hence ACQL) increases exponentially with increase in the variance of the path delay.

### VII. SUMMARY

AC defects in logic chips can cause a multi-chip module to fail. ACQL (the probability that a module fails due to an AC defect) is a function of the path delay distribution, defect size distribution, probability of defect occurrence, and the cycle time. An analytic model for projecting ACQL was presented. The model can be used to determine the factors that have the most effect on ACQL.



Figure 6. Probability of a Module failure due to a single ac defect



Figure 7. Module failure breakdown by mean path delay



Figure 8. System Sensitivity vs. standard deviation of path delay

It also allows us to determine what changes in path delay distribution. defect size distribution and cycle time are required to meet a given ACQL value.

# REFERENCES

- Robert Hitchcock, Sr., Gordon L. Smith, and David D. Cheng, "Timing Analysis of Computer Hardware," IBM J. Res. Develop. 26,100-105, January 1982.
- E. P. Hsieh, R. A. Rasmussen, L.J. Vidunas, and W. T. Davis, "Delay Test Generation," in Proc. 14th Design Automation Conference., New Orleans, June 20-22, 1977, pp. 492-494.
- 3. Donald S. Cleverley, "The Role of Testing In Achieving Zero Defects," Proceedings of the

1982 IEEE International Test Conference, November 1982, pp. 248-253.

- 4. K. E. Torku and C. E. Radke, "Quality Level and Fault Coverage for Multi Chip Modules," in Proc. 20th Design Automation Conference., Miami Beach, June 27-29, 1983, pp. 201-206.
- C. C. Beh, K. H. Arya, C. E. Radke, and K. E. Torku, "Do Stuck Fault Models Reflect Manufacturing Defects?," Proceeedings of the 1982 IEEE International Test Conference, pp. 35-42.
- D. B. Owen, "Handbook of Statistical Tables," Addison-Wesley, Reading, Mass., 1962, pp.3-10.
- 7. APL Statistical Library, Form No. SH20-1841, IBM Corporation, White Plains, N.Y., 1976, pp. 60-61.