Sunday, October 23, 2011

Floor Planning


Floor Planning

The first step in the Physical Design flow is Floor Planning. Floorplanning is the process of identifying structures that should be placed close together, and allocating space for them in such a manner as to meet the sometimes conflicting goals of available space (cost of the chip), required performance, and the desire to have everything close to everything else.
Based on the area of the design and the hierarchy, a suitable floorplan is decided upon. Floor Planning takes into account the macro's used in the design, memory, other IP cores and their placement needs, the routing possibilities and also the area of the entire design. Floor planning also decides the IO structure, aspect ratio of the design. A bad floor-plan will lead to waste-age of die area and routing congestion.
In many design methodologies, Area and Speed are considered to be things that should be traded off against each other. The reason this is so is probably because there are limited routing resources, and the more routing resources that are used, the slower the design will operate. Optimizing for minimum area allows the design to use fewer resources, but also allows the sections of the design to be closer together. This leads to shorter interconnect distances, less routing resources to be used, faster end-to-end signal paths, and even faster and more consistent place and route times. Done correctly , there are no negatives to Floor-planning.
As a general rule, data-path sections benefit most from Floorplanning, and random logic, state machines, and other non-structured logic can safely be left to the placer section of the place and route software.
Data paths are typically the areas of your design where multiple bits are processed in parallel with each bit being modified the same way with maybe some influence from adjacent bits. Example structures that make up data paths are Adders, Subtractors, Counters, Registers, and Muxes.

Friday, October 21, 2011

Latch-Up


What is latch up in CMOS design and ways to prevent it?

A Problem which is inherent in the p-well and n-well processses is due to relatively large number of junctions which are formed in these structures, the consequent presence of parasitic diodes and transistors.

Latch-up is a condition in which the parasitic components give rise to the Establishment of low resistance conducting path between VDD and VSS with Disastrous results

Latch-up may be induced by glitches on the supply rails or by incident radiation.

Latch-up pertains to a failure mechanism wherein a parasitic thyristor (such as a parasitic silicon controlled rectifier, or SCR) is inadvertently created within a circuit, causing a high amount of current to continuously flow through it once it is accidentally triggered or turned on. Depending on the circuits involved, the amount of current flow produced by this mechanism can be large enough to result in permanent destruction of the device due to electrical overstress (EOS).

Preventions for Latch-Up
  • by adding tap wells, for example in an Inverter for NMOS add N+ tap in n-well and conncet it to Vdd, and for PMOS add P+ tap in p-substrate and connect it to Vss. 
  • an increase in substrate doping levels with a consequent drop in the value of  Rs.
  • reducing Rp by control of fabrication parameters and by ensuring a low contact resistance to Vss.
  • and the other is by introducing of guard rings.....

Latchup in Bulk CMOS
A byproduct of the Bulk CMOS structure is a pair of parasitic bipolar transistors. The collector of each BJT is connected to the base of the other transistor in a positive feedback structure. A phenomenon called latchup can occur when (1) both BJT's conduct, creating a low resistance path between Vdd and GND and (2) the product of the gains of the two transistors in the feedback loop, b1 x b2, is greater than one. The result of latchup is at the minimum a circuit malfunction, and in the worst case, the destruction of the device.
Cross section of parasitic transistors in Bulk CMOS
Equivalent Circuit
Latchup may begin when Vout drops below GND due to a noise spike or an improper circuit hookup (Vout is the base of the lateral NPN Q2). If sufficient current flows through Rsub to turn on Q2 (I Rsub > 0.7 V ), this will draw current through Rwell. If the voltage drop across Rwell is high enough, Q1 will also turn on, and a self-sustaining low resistance path between the power rails is formed. If the gains are such that b1 x b2 > 1, latchup may occur. Once latchup has begun, the only way to stop it is to reduce the current below a critical level, usually by removing power from the circuit.
The most likely place for latchup to occur is in pad drivers, where large voltage transients and large currents are present.
Preventing latchup
Fab/Design Approaches

  1. Reduce the gain product b1 x b1

  • move n-well and n+ source/drain farther apart increases width of the base of Q2 and reduces gain beta2 ­> also reduces circuit density
  • buried n+ layer in well reduces gain of Q1
    2. Reduce the well and substrate resistances, producing lower voltage drops

·        higher substrate doping level reduces Rsub
·        reduce Rwell by making low resistance contact to GND
·        guard rings around p- and/or n-well, with frequent contacts to the rings, reduces the parasitic resistances.
CMOS transistors with guard rings
Systems Approaches
  1. Make sure power supplies are off before plugging a board. A "hot plug in" of an unpowered circuit board or module may cause signal pins to see surge voltages greater than 0.7 V higher than Vdd, which rises more slowly to is peak value. When the chip comes up to full power, sections of it could be latched.
  2. Carefully protect electrostatic protection devices associated with I/O pads with guard rings. Electrostatic discharge can trigger latchup. ESD enters the circuit through an I/O pad, where it is clamped to one of the rails by the ESD protection circuit. Devices in the protection circuit can inject minority carriers in the substrate or well, potentially triggering latchup.
  3. Radiation, including x-rays, cosmic, or alpha rays, can generate electron-hole pairs as they penetrate the chip. These carriers can contribute to well or substrate currents.
  4. Sudden transients on the power or ground bus, which may occur if large numbers of transistors switch simultaneously, can drive the circuit into latchup. Whether this is possible should be checked through simulation.
Referrences:
http://www.ece.drexel.edu/courses/ECE-E431/latch-up/latch-up.html

Tuesday, October 4, 2011

Delays in ASIC Design


Delays in ASIC Design


We encounter several types of delays in ASIC design. They are as follows:

·         Gate delay or Intrinsic delay
·         Net delay or Interconnect delay or Wire delay or Extrinsic delay or Flight time
·         Transition or Slew
·         Propagation delay
·         Contamination delay

Wire delays or extrinsic delays are calculated using output drive strength, input capacitance and wire load models. Other delays are intrinsic properties of each and every gate.
Delays are interdependent on different electrical properties. [Nekoogar]:

  • Input capacitance of the logic gate is a function of output state, output loads and input slew rate.
  • Internal timing arcs and output slew rate is a function of switching input(s).
  • Capacitance of the wire is dependent on frequency.
  • Internal timing arcs are a function of input slew rates.
  • Output slew rate is a function of input slew rate on each input.
  • Wires exhibit RLC characteristics instead of lumped RC.

Gate Delay
Transistors within a gate take a finite time to switch. This means that a change on the input of a gate takes a finite time to cause a change on the output. [Magma]
Gate delay =function of (input transition (slew) time, Cnet+Cpin).
or
Gate delay =function of (input transition (slew) time, Cload).
where Cload=Cnet+Cpin
Cnet-->Net capacitance
Cpin-->pin capacitance of the driven cell
Cell delay is also same as Gate delay.

How gate delay is calculated?
Cell or gate delay is calculated using Non-Linear Delay Models (NLDM). NLDM is highly accurate as it is derived from SPICE characterizations. The delay is a function of the input transition time (i.e. slew) of the cell, the wire capacitance and the pin capacitance of the driven cells. A slow input transition time will slow the rate at which the cell’s transistors can change state logic 1 to logic 0 (or logic 0 to logic 1), as well as a large output load Cload (Cnet + Cpin), thereby increasing the delay of the logic gate.

There is another NLDM table in the library to calculate output transition. Output transition of a cell becomes the input transition of the next cell down the chain.


·         Table models are usually two-dimensional to allow lookups based on the input slew and the output load (Cload). A sample table is given below.

timing() {
related_pin : "CKN";
timing_type : falling_edge;
timing_sense : non_unate;
cell_rise(delay_template_7x7) {
index_1 ("0.012, 0.032, 0.074, 0.154, 0.318, 0.644, 1.3");
index_2 ("0.001278, 0.0046008, 0.0112464, 0.0245376, 0.05112, 0.10454,0.212148");
values ( \
"0.225894, 0.249015, 0.285537, 0.352680, 0.484244, 0.748180, 1.279570", \
"0.231295, 0.254415, 0.290938, 0.358081, 0.489646, 0.753585, 1.284980", \
"0.243754, 0.266878, 0.303398, 0.370542, 0.502105, 0.766044, 1.297440", \
"0.267240, 0.290389, 0.326908, 0.394052, 0.525615, 0.789561, 1.320950", \
"0.307080, 0.330200, 0.366721, 0.433861, 0.565425, 0.829373, 1.360760", \
"0.380552, 0.403875, 0.440426, 0.507569, 0.639136, 0.903084, 1.434500", \
"0.497588, 0.521769, 0.558548, 0.625744, 0.757301, 1.021260, 1.552680");
}
rise_transition(delay_template_7x7) {
index_1 ("0.012, 0.032, 0.074, 0.154, 0.318, 0.644, 1.3");
index_2 ("0.001278, 0.0046008, 0.0112464, 0.0245376, 0.05112, 0.10454, 0.212148");
values ( \
"0.040574, 0.068619, 0.125391, 0.246672, 0.497688, 1.005982, 2.030120", \
"0.040570, 0.068618, 0.125390, 0.246672, 0.497688, 1.005940, 2.030240", \
"0.040565, 0.068616, 0.125389, 0.246650, 0.497770, 1.006180, 2.030120", \
"0.040532, 0.068612, 0.125387, 0.246670, 0.497710, 1.006164, 2.030100", \
"0.040578, 0.068621, 0.125392, 0.246636, 0.497688, 1.006182, 2.030040", \
"0.041763, 0.069211, 0.125662, 0.246758, 0.497726, 1.005930, 2.030000", \
"0.045813, 0.071321, 0.126671, 0.247154, 0.497846, 1.005962, 2.030180");
}


index_1 --> input transition values
index_2--> output load capacitance values
values--> delay values

Situation 1:
Input transition and output load values match with table index values

If both input transition and output load values match with table index values then corresponding delay value is directly picked up from the delay “values” table as highlighted by yellow shaded data.

Situation 2:
Output load values doesn't match with table index values

·         When the actual load capacitance values does not fall directly on or at one of the load-axis index points, the delay is determined by interpolation from the closest points. Note that to carry out interpolation input transition point should match with the any one of the table index values.
·         Determine the equation for the line segment connecting the two nearest points in the table.


To do this first we need to find the slope value.
Slope m = (y2-y1)/(x2-x1) where (y2-y1) is delay segment (generally in ns) on y axis and (x2-x1) is load segment (generally in pf) on x-axis.
·         Solve for the delay at the load point of interest.

The linear equation is:
y = mx+c
where
y-->delay (ns)
m-->slope
x-->load capacitance (pf)

i.e. delay=slope*load point of interest (constant value is zero)

Load point of interest means load capacitance value for which delay has to be calculated.

Situation 3:
Both input transition and output load values doesn't match with table index values

·         If both input transition and load capacitance values do not match exactly with the look up table index values then bilinear interpolation is used.
·         Multiple linear interpolations (~3) are performed on multiple closest table data points (~4) as shown in highlighted violet color in the look up table.

Situation 4:
Output load values doesn't match with table index values and is outside the table boundary

·         When the load point is outside of the boundary of the index, the delay is extrapolated to the closest known points.
·         Lookup value too far out of range of the given table value could lead to inaccuracy. [Cadence]

Intrinsic delay

·         Intrinsic delay is the delay internal to the gate. This is from input pin of the cell to output pin of the cell.
·         It is defined as the delay between an input and output pair of a cell, when a near zero slew is applied to the input pin and the output does not see any load condition. It is caused by the internal capacitance associated with its transistor.
·         This delay is largely dependent on the size of the transistors forming the gate because increasing size of transistors increase internal capacitors.

References
[Nekoogar] Farzad Nekoogar, “Timing Verification of Application Specific Integrated Circuits”, Prentice Hall
[Magma] Magma Blast Fusion User Guides
[Cadence] Cadence SOC Encounter User Guides

What is Static Timing Analysis (STA)


What is Static Timing Analysis (STA)?

In Static Timing Analysis (STA) static delays such as gate delay and net delays are considered in each path and these delays are compared against their required maximum and minimum values. Circuit to be analyzed is broken into different timing paths constituting of gates, flip flops and their interconnections. Each timing path has to process the data within a clock period which is determined by the maximum frequency of operation. Cell delays are available in the corresponding technology libraries. Cell delay values are tabulated based on input transition and fanout load which are characterized by SPICE simulation. Net delays are calculated based on the Wire Load Models(WLM) or extracted resistance R and capacitance C. Wire Load Models(WLM) are available in the Technology File. These values are Table Look Up(TLU) values calculated based on the net fanout length.

The static timing analyzer will report the following delays (or it can do following analysis):
Register to Register delays
Setup times of all external synchronous inputs
Clock to Output delays
Pin to Pin combinational delays
Different Analysis Modes-Best, Worst, Typical, On Chip Variation (OCV)
Data to Data Checks
Case Analysis
Multiple Clocks per Register
Minimum Pulse Width Checks
Derived Clocks
Clock Gating Checks
Netlist Editing
Report_clock_timing
Clock Reconvergence Pessimism
Worst-Arrival Slew Propagation
Path-Based Analysis
Debugging Delay Calculation

and many more......!!

The wide spread use of STA can be attributed to several factors [David]:

  • The basic STA algorithm is linear in runtime with circuit size, allowing analysis of designs in excess of 10 million instances.
  • The basic STA analysis is conservative in the sense that it will over-estimate the delay of long paths in the circuit and under-estimate the delay of short paths in the circuit. This makes the analysis ”safe”, guaranteeing that the design will function at least as fast as predicted and will not suffer from hold-time violations.
  • The STA algorithms have become fairly mature, addressing critical timing issues such as interconnect analysis, accurate delay modeling, false or multi-cycle paths, etc.
  • Delay characterization for cell libraries is clearly defined, forms an effective interface between the foundry and the design team, and is readily available. In addition to this, the Static Timing Analysis (STA) does not require input vectors and has a runtime that is linear with the size of the circuit [Agarwal].

Advantages of STA:

  • All timing paths are considered for the timing analysis. This is not the case in simulation.
  • Analysis times are relatively short when compared with event and circuit simulation.
  • Timing can be analyzed for worst case, best case simultaneously. This type of analysis is not possible in dynamic timing analysis.
  • Static Timing Analysis (STA) works with timing models. STA has more pessimism and thus gives maximum delay of the design. DTA performs full timing simulation. The problem associated with DTA is the computational complexity involved in finding the input patterns (vectors) that produce maximum delay at the output and hence it is slow.

Disadvantages of STA:
  • All paths in the design may not run always in worst case delay. Hence the analysis is pessimistic.
  • Clock related all information has to be fed to the design in the form of constraints.
  • Inconsistency or incorrectness or under constraining of these constraints may lead to disastrous timing analysis.
  • STA does not check for logical correctness of the design.
  • STA is not suitable for asynchronous circuits.


References

[David] David Blaauw, Kaviraj Chopra, Ashish Srivastava and Lou Scheffer, “Statistical Timing Analysis: From basic principles to state-of-the-art.”, Transactions on Computer-Aided Design of Integrated Circuits and Systems (T-CAD), IEEE. 
[Agarwal] Agarwal, A. Blaauw, D. Zolotov, V. Sundareswaran, S. Min Zhao Gala, K. and Panda, R., “Statistically Delay computation considering spatial correlations,” Proceedings of the ASP-DAC 2003, pp.271-276, Jan 2003.