low power architectures lecture #1:introductionde/lowpower.pdf · max power (watts) i386 i486...
TRANSCRIPT
1
Technion, EE departmentDr. Avi Mendelson, Intel
Low power ArchitecturesLow power Architectures
Lecture #1:IntroductionLecture #1:IntroductionDr. Avi Mendelson
[email protected]: Ronny Ronen, Eli Savransky, Shekhar Borkar, Fred Pollack
Dr. Avi MendelsonDr. Avi Mendelson
[email protected]@ee.technion.ac.ilContributors: Ronny Ronen, Eli Savransky, Shekhar Borkar, Fred PContributors: Ronny Ronen, Eli Savransky, Shekhar Borkar, Fred Pollackollack
Dr. Avi Mendelson Page 2
Why Power meter?Why Power meter?
2
Dr. Avi Mendelson Page 3
MooreMoore’’s Laws LawFor many years technology obeys MooreFor many years technology obeys Moore’’s Laws Law
’70 ’73 ’76 ’79 ’82 ’85 ’88 ’91 ’94 '97 2000’70 ’73 ’76 ’79 ’82 ’85 ’88 ’91 ’94 '97 2000
TransistorsTransistorsPer DiePer Die
101088
101077
101066
101055
101044
101033
101022
101011
101000
1K1K4K4K 16K16K
64K64K256K256K
1M1M
16M16M4M4M
64M64M
4004400480808080
808680868028680286 i386™i386™
i486™i486™ PentiumPentium®®
MemoryMemoryMicroprocessorMicroprocessor
Source: Intel Source: Intel
PentiumPentium® ® IIIIII
256M256M
PentiumPentium®® ProProPentiumPentium® ® IIII
Dr. Avi Mendelson Page 4
In the Last 25 Years Life was EasyIn the Last 25 Years Life was Easy(*)(*)
Doubling of transistor density every 30 monthsDoubling of transistor density every 30 monthsIncreasing die sizes, allowed by Increasing die sizes, allowed by –– Increasing Wafer SizeIncreasing Wafer Size–– Process technology moving from Process technology moving from ““black artblack art”” to to ““manufacturing manufacturing
sciencescience””
⇒⇒ Doubling of transistors every 18 monthsDoubling of transistors every 18 months
Tech Old µArch mm (linear) New µArch mm (linear) Ratio Ratio 1.0µ i386C 6.5 i486 11.5 3.1
0.7µ i486C 9.5 Pentium® 17 3.2 0.5µ Pentium® 12.2 Pentium® Pro 17.3 2.1 0.18µ Pentium® III 10.3 Next Gen ? 2--3
Implications: (in the same technology)
1. New µArch ~ 2-3X die area of the last µArch
2. Provides 1.5-1.7X integer performance of the last µArch
(*) source source Fred Pollack, Fred Pollack, MicroMicro--3232
3
Dr. Avi Mendelson Page 5
Suddenly, the power Suddenly, the power monster appearsmonster appears
Dr. Avi Mendelson Page 6
The power crisis The power crisis –– power consumptionpower consumption
Sourse: cool-chips, Micro 32
4
Dr. Avi Mendelson Page 7
Processor Power EvolutionProcessor Power Evolution
New generation: always increase powerNew generation: always increase powerCompactions: higher performance at lower powerCompactions: higher performance at lower power
One size fits all: start with high power segment and shrink it tOne size fits all: start with high power segment and shrink it to Mobileo Mobile
Max
Po
wer
(W
atts
)
i386 i386
i486 i486
Pentium® Pentium®
Pentium®w/MMX tech.
Pentium®w/MMX tech.
1
10
100
1.5µ 1µ 0.8µ 0.6µ 0.35µ 0.25µ 0.18µ 0.13µ
Pentium® Pro Pentium® Pro Pentium® II Pentium® II
Pentium® 4Pentium® 4Pentium® 4
??
Pentium® III Pentium® III
Dr. Avi Mendelson Page 8
The power crisis: The power crisis: Power DensityPower DensityA real thread to the Moor lawA real thread to the Moor lawThink of watts/cmThink of watts/cm22
Power is not distributed evenly over the chip. A Power is not distributed evenly over the chip. A failure can happen if a single point reach the max failure can happen if a single point reach the max power point.power point.Complex algorithms lead to denser power:Complex algorithms lead to denser power:–– Dense random logicDense random logic
Timing pressure leads to faster/bigger/powerTiming pressure leads to faster/bigger/power--hungrier gateshungrier gates–– Designers put together units that communicate with each Designers put together units that communicate with each
other. It creates other. It creates ““regionsregions”” with high activity factors with high activity factors --> hot > hot spots.spots.
5
Dr. Avi Mendelson Page 9
Power DensityPower DensityW
atts
/cm
2
1
10
100
1000
1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ 0.1µ 0.07µ
i386i386i486i486
Pentium® Pentium® Pentium® ProPentium® Pro
Pentium® IIPentium® IIPentium® IIIPentium® IIIHot plateHot plate
Nuclear ReactorNuclear ReactorNuclear Reactor
RocketNozzleRocketRocketNozzleNozzle
* “New Microarchitecture Challenges in the Coming Generations of* “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” CMOS Process Technologies” ––Fred Pollack, Intel Corp. Micro32 conference key note Fred Pollack, Intel Corp. Micro32 conference key note -- 1999.1999.
Pentium® 4Pentium® 4
Dr. Avi Mendelson Page 10
Some implicationsSome implicationsWe canWe can’’t build microprocessors with ever t build microprocessors with ever increasing power density and die sizesincreasing power density and die sizes
The constraint is power The constraint is power –– not manufacturabilitynot manufacturability
The design of The design of any any future microfuture micro--processor shouldprocessor shouldtake power into consideration. We need totake power into consideration. We need todistinguish between different aspects of power:distinguish between different aspects of power:
Power deliveryPower deliveryMax power (TJ) Max power (TJ)
Power density Power density -- hot spots hot spots Energy Energy –– static + dynamicstatic + dynamic
Power and Energy aware design should take carePower and Energy aware design should take careof each of these aspectsof each of these aspects
OneOne-- size does not fit all anymoresize does not fit all anymore
6
Dr. Avi Mendelson Page 11
Why power and power Why power and power density increase over density increase over
timetime
Dr. Avi Mendelson Page 12
Basic terminologyBasic terminology
7
Dr. Avi Mendelson Page 13
Power and the digital worldPower and the digital world……Power is consumed:Power is consumed:–– When capacitance is charged and dischargedWhen capacitance is charged and discharged–– A charged cap is a logical A charged cap is a logical ‘‘11’’, a discharged cap is , a discharged cap is ‘‘00’’
The capacitance can be the gates of other transistors or wires The capacitance can be the gates of other transistors or wires (buses and long interconnects)(buses and long interconnects)
IN OUT
00 110011
E=CVE=CV22
Dr. Avi Mendelson Page 14
Power and the digital world (2)Power and the digital world (2)……Secondary effects like leakage and shortSecondary effects like leakage and short--circuit current arecircuit current areincreasing with advanced process technologiesincreasing with advanced process technologies
Leakage is growing dramaticallyLeakage is growing dramatically–– 7% now, expect 20% in next process technology, 50% in next one7% now, expect 20% in next process technology, 50% in next one–– …… Unless we do something (and we will)Unless we do something (and we will)
IN OUT
1/21/2
IN OUT
00 11
LeakageLeakage(sub(sub--threshold)threshold)
ShortShort--circuitcircuit
8
Dr. Avi Mendelson Page 15
Power & EnergyPower & EnergyEnergyEnergy
““The capacity for doing workThe capacity for doing work”” **Important forImportant for
–– Battery life Battery life -- lower energy per task lower energy per task longer battery lifelonger battery life–– Electric bills Electric bills -- lower energy per task lower energy per task lower billslower bills
Measured over timeMeasured over timeProportional to the overall capacitance and to Proportional to the overall capacitance and to the voltage squared (the voltage squared (CVCV22))
* Merriam* Merriam--Webster’s Collegiate® Dictionary Webster’s Collegiate® Dictionary -- http://www.mhttp://www.m--w.com/w.com/
Dr. Avi Mendelson Page 16
Power & EnergyPower & EnergyPowerPower
WorkWork done per time unitdone per time unit–– Measured in WattsMeasured in Watts
P = P = ααCVCV22ff((αα: activity, C: capacitance, V: voltage, f: frequency): activity, C: capacitance, V: voltage, f: frequency)
““MeasuredMeasured”” at peak timeat peak timeHigher power Higher power higher currenthigher current–– Cannot exceed platform power delivery constrainsCannot exceed platform power delivery constrains
Higher power Higher power higher temperaturehigher temperature–– Cannot exceed the thermal constrainsCannot exceed the thermal constrains
9
Dr. Avi Mendelson Page 17
Voltage, Power, FrequencyVoltage, Power, FrequencyTransistor switches faster at higher voltageTransistor switches faster at higher voltage
Higher voltage enables higher frequencyHigher voltage enables higher frequencyMaximum frequency grows about linearly with voltageMaximum frequency grows about linearly with voltage……Within a given voltage range Within a given voltage range VminVmin--VmaxVmax–– V < V < VminVmin
transistors wontransistors won’’t switcht switch–– V > V > VmaxVmax
the device may burnthe device may burn
““The cube lawThe cube law””::P = kVP = kV33
(or ~1%V=3%P)(or ~1%V=3%P)
0
100
200
300
400
500
600
700
800
900
1000
0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9
Fequency(Mhz)
Power (mWatt)
* Source: Intel Corp. (http://developer.intel.com)* Source: Intel Corp. (http://developer.intel.com)
XScale processor freq. & power vs. voltage *
Dr. Avi Mendelson Page 18
The scalability theory The scalability theory ––process technologyprocess technology
Target (Target (Ideally):EacheIdeally):Eache generation (2generation (2--3 years)3 years)Reduce gate delay by 30% 50% freq gainReduce gate delay by 30% 50% freq gain2. Increase density by 2x2. Increase density by 2x–– 0.7 shrink on a side, 50% area reduction on 0.7 shrink on a side, 50% area reduction on
compactioncompaction–– Transistor Z and L shrink by 30%Transistor Z and L shrink by 30%–– Interconnect pitches shrink by ~30%Interconnect pitches shrink by ~30%–– Add metal layers to makeAdd metal layers to make--up for (1) pitches < up for (1) pitches <
30%, and (2) RC30%, and (2) RC
10
Dr. Avi Mendelson Page 19
Scaling theoryScaling theory----1 of 21 of 2
7.0
,7.0
,7.07.0
7.07.0
=⇒==
=×==
CCapTotal
CCapFringing
CCapArea
f
a
Lateral and vertical dimensions reduce 30%
Capacitance--area and fringing--reduce 30%
7.0,7.0,7.0 ===== oxtLLengthWWidth
27.07.07.0 =×=×= YXAreaDieDie area reduces 50%
Dr. Avi Mendelson Page 20
Scaling theoryScaling theory----2 of 22 of 27.0
17.0 ==
TransistorCap
Capacitance per transistor reduces 30%
7.01
7.07.07.0 =
×=
AreaCap
Capacitance per unit area increases 43%
22
2 7.07.0
7.07.0,7.0
7.07.07.0
7.07.0
7.07.0)(,7.0,7.0
=×=××==×=×=
=×=−===
fVCPowerIVddC
T
VVddtW
IVVdd t
ox
t
Delay reduces 30%, power reduces 50%
11
Dr. Avi Mendelson Page 21
Process Technology Process Technology –– the Enablerthe EnablerEvery process generation (every 2Every process generation (every 2--3 years), 3 years), Ideally:Ideally:–– Shorten gate delay by 30%Shorten gate delay by 30%
~50% (100/70) frequency gain~50% (100/70) frequency gain–– VddVdd scaled down by ~30%scaled down by ~30%
Results:Results:»» 22//33 reduction in energy/transition reduction in energy/transition
(CV(CV22 0.7 x 0.70.7 x 0.722 = 0.34X)= 0.34X)»» 11//22 reduction in power reduction in power
(CV(CV22ff 0.7 x 0.70.7 x 0.72 2 x 1.5 =0.5X)x 1.5 =0.5X)»» Power density unchangedPower density unchanged
Dr. Avi Mendelson Page 22
Ideal Scenarios...Ideal Scenarios...Ideal Ideal ““ShrinkShrink””–– Same Same µµarcharch–– 1X #Xistors1X #Xistors–– 0.5X size0.5X size–– 1.5X frequency1.5X frequency
–– 0.5X power0.5X power–– 1X IPC (1X IPC (instrinstr./cycle)./cycle)–– 1.5X performance1.5X performance–– 1X power density1X power density
Looks good. IsnLooks good. Isn’’t it?t it?
Ideal New Ideal New µµarcharch–– Same die sizeSame die size–– 2X #Xistors2X #Xistors–– 1X size1X size–– 1.5X frequency1.5X frequency
–– 1X power1X power–– 2X IPC2X IPC–– 3X performance3X performance–– 1X power density1X power density
12
Dr. Avi Mendelson Page 23
Process Technologies Process Technologies –– RealityRealityBut in But in reality:reality:–– New designs squeeze frequency to 2X per processNew designs squeeze frequency to 2X per process–– New designs use more transistors (2XNew designs use more transistors (2X--3X to get 1.5X3X to get 1.5X--1.7X 1.7X perfperf))
So, every new process and architecture generation:So, every new process and architecture generation:–– Power goes up about 2XPower goes up about 2X–– Power density goes up 30%~80%Power density goes up 30%~80%
This is bad, andThis is bad, and……Will get worse in future process generations:Will get worse in future process generations:–– Voltage (Voltage (VddVdd) will scale down less) will scale down less–– Leakage is going to the roofLeakage is going to the roof
Not as good as it first lookedNot as good as it first looked…… Aha?Aha?
Dr. Avi Mendelson Page 24
In PictureIn Picture……Silicon Process TechnologySilicon Process Technology 1.51.5µµ 1.01.0µµ 0.80.8µµ 0.60.6µµ 0.350.35µµ 0.250.25µµ 0.180.18µµ 0.130.13µµ
Intel386Intel386™™ DX DX ProcessorProcessor
Intel486Intel486™™ DX DX ProcessorProcessor
PentiumPentium®®ProcessorProcessor
PentiumPentium®® Pro Pro ProcessorProcessor
PentiumPentium®® II II ProcessorProcessor
PentiumPentium®® 4 4 ProcessorProcessor
PentiumPentium®® III III ProcessorProcessor
13
Dr. Avi Mendelson Page 25
Performance Efficiency of Performance Efficiency of µµarchitecturesarchitectures
ImplicationsImplications: (in the same technology): (in the same technology)
1. New 1. New µµarch ~2arch ~2--3X die area of the last 3X die area of the last µµarcharch
2. Provides 1.42. Provides 1.4--1.8X integer performance of 1.8X integer performance of the last the last µµarcharch
We are on the Wrong Side of a Square LawWe are on the Wrong Side of a Square Law
OldOld mmmm NewNew mmmm AreaArea PerfPerfTechTech µµArchArch (linear)(linear) µµArchArch (linear)(linear) RatioRatio RatioRatio1.01.0µµ i386Ci386C 6.56.5 i486i486 11.511.5 3.13.10.70.7µµ i486Ci486C 9.59.5 PentiumPentium®® procproc 1717 3.23.2 1.81.80.50.5µµ PentiumPentium®® procproc 12.212.2 Pentium ProPentium Pro®® procproc 17.317.3 2.12.1 1.51.50.180.18µµ Pentium IIIPentium III®® procproc 10.310.3 PentiumPentium®® 4 proc4 proc 14.714.7 22 1.41.4
0
0.5
1
1.5
2
2.5
3
3.5
486=>PP PP=>Ppro PIII=>P4P
Area
Perf.
Dr. Avi Mendelson Page 26
Power Evolution Power Evolution (Theoretical)(Theoretical)
For a 15mm/side die (225mmFor a 15mm/side die (225mm22))Assume 2X frequency increase each generationAssume 2X frequency increase each generationFuture process numbers are estimatedFuture process numbers are estimated
00
5050
100100
150150
200200
250250
0.25µ0.25µ 0.18µ0.18µ 0.13µ0.13µ 0.1µ0.1µ
Wat
tsW
atts
00
2525
5050
7575
100100
Po
wer
Den
sity
(W
/cm
Po
wer
Den
sity
(W
/cm
22 ))Leakage PowerLeakage PowerActive PowerActive PowerPower DensityPower Density
14
Dr. Avi Mendelson Page 27
More aspects of power More aspects of power aware architecturesaware architectures
Dr. Avi Mendelson Page 28
The The ““Power BottleneckPower Bottleneck”” (Hot Spots)(Hot Spots)The Thermal StoryThe Thermal Story
Silicon is not a good heat conductorSilicon is not a good heat conductorWith high power density, cannot assume power uniformityWith high power density, cannot assume power uniformity–– High temperature High temperature high leakage high leakage high power high power higher temperaturehigher temperature
Artificially expanding the die size not help. Must attack the hoArtificially expanding the die size not help. Must attack the hot spotst spotsSmart layout that separates the hot units increases the processoSmart layout that separates the hot units increases the processorr’’s s power envelop!power envelop!
0
50
100
150
200
250
Hea
t F
lux
2)
40
50
60
70
80
90
100
110
Tem
per
atu
re
Power MapPower Map OnOn--Die TemperatureDie Temperature
* “New Microarchitecture Challenges in the Coming Generations of* “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” CMOS Process Technologies” ––Fred Pollack, Intel Corp. Micro32 conference key note Fred Pollack, Intel Corp. Micro32 conference key note -- 1999.1999.
15
Dr. Avi Mendelson Page 29
The Mythical Power EnvelopThe Mythical Power EnvelopCPU CPU ““power enveloppower envelop””–– Maximum power that commercial cooling technologies can Maximum power that commercial cooling technologies can
dissipatedissipate
Limited byLimited by–– Total system powerTotal system power–– Processor power. Typical figures:Processor power. Typical figures:
Server <130W, Desktop 50Server <130W, Desktop 50--80W, Notebook 2080W, Notebook 20--30W, sub30W, sub--notebook <10Wnotebook <10W
Bigger systems cool better and dissipate more powerBigger systems cool better and dissipate more power–– Heat syncsHeat syncs–– Heat pipesHeat pipes–– Better TIM (Thermal Interface Materials)Better TIM (Thermal Interface Materials)
Average power density matters:Average power density matters:–– Uniformly distributed power allows for higher CPU dissipationUniformly distributed power allows for higher CPU dissipation
Dr. Avi Mendelson Page 30
Energy EfficiencyEnergy EfficiencyEnergy per taskEnergy per task–– Proportional to # of processed Proportional to # of processed instsinsts. per task. per task–– Proportional to the average work consumed per instructionProportional to the average work consumed per instruction–– Deteriorates as speculation increases and complexity growsDeteriorates as speculation increases and complexity growsOr Formally, per a given task, Or Formally, per a given task, –– Energy per retired instruction is: Energy per retired instruction is: ββ*W*W, , wherewhere
»» ββ: Ratio of : Ratio of TotalTotal to to RetiredRetired number of processed instructionsnumber of processed instructions»» W: Average energy spent in processing an instructionW: Average energy spent in processing an instructionBoth figures grow with every new microBoth figures grow with every new micro--architecturearchitecture
In that respect:In that respect:high performance modern microhigh performance modern micro--architectures architectures are less energyare less energy--efficientefficientLuckily, process technology offsets that by Luckily, process technology offsets that by reducing energy per switchreducing energy per switch
16
Dr. Avi Mendelson Page 31
Voltage ScalingVoltage ScalingWithin a given voltage range, higher voltage allows higher freq.Within a given voltage range, higher voltage allows higher freq.
Used for trading power and frequency. EitherUsed for trading power and frequency. Either–– Statically, at manufacturing timeStatically, at manufacturing time
–– Dynamically, at run time (e.g., IntelDynamically, at run time (e.g., Intel’’s s SpeedStepSpeedStep®® TechnologyTechnology
Actual range depends on specificActual range depends on specificdesign and process technologydesign and process technologyExamples*:Examples*:–– IntelIntel®® XScaleXScale™™ processors runsprocessors runs
from from 0.75V0.75V (150MHz/50mW)(150MHz/50mW)to to 1.65V1.65V (800MHz/900mW)(800MHz/900mW)
–– Intel mobile PentiumIntel mobile Pentium®® III processorIII processorsells from sells from 1.1V1.1V (600MHz)(600MHz)to to 1.7V1.7V (1GHz)(1GHz)
0
100
200
300
400
500
600
700
800
900
1000
0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9
Fequency(Mhz)
Power (mWatt)
* Source: Intel Corp. (http://developer.intel.com)* Source: Intel Corp. (http://developer.intel.com)
XScale proc. freq & power vs voltage
Dr. Avi Mendelson Page 32
Alpha hot spotsAlpha hot spots
Source - CoolChips-99
Area 30%
Freq. 50%
Power 67%
17
Dr. Avi Mendelson Page 33
Voltage Scaling (cont.)Voltage Scaling (cont.)Huge effect on Power:Huge effect on Power:20% freq reduction 20% freq reduction 20% voltage reduction20% voltage reduction
35% energy reduction. 35% energy reduction. ((ααCVCV22 = = ααCC*0.8*0.822 = = ααCC*0.64)*0.64)
50% power reduction. 50% power reduction. ((ααCVCV22f = f = ααCC*0.8*0.833 == ααCC*0.51)*0.51)
Even more impressive if we recall:Even more impressive if we recall:–– 20% freq hit 20% freq hit only 10%only 10%--15% performance hit*15% performance hit*
Voltage scaling can be used to trade Voltage scaling can be used to trade performance with power!performance with power!
* Depends mainly on core to bus frequency ratio and caches size.* Depends mainly on core to bus frequency ratio and caches size.
Dr. Avi Mendelson Page 34
What is the impact of the What is the impact of the computer architecture?computer architecture?
18
Dr. Avi Mendelson Page 35
Power and performancePower and performance----tradetrade--offoff
Voltage scaling alone is not enough to cap power
So far, the analysis indicates that:
You will have to tradeoff performance for power– Reduce die area => Reduce Active C => Reduce Perf(C) & Power(C)– Reduce Vdd and freq => Perf(freq) & Power(freq,Vdd2)--cubic– Reduce dies size, Vdd, and freq
Argument:
Tradeoff performance for power does Goal #1 make sense?
Set a goal that comprehends both: performance and power.
Use energy delay product (E*D) to evaluate tradeoffs
Dr. Avi Mendelson Page 36
E*D product (lower is better) may provide E*D product (lower is better) may provide a better criteriaa better criteria
E = energy / instructionE = energy / instruction= = Power * sec / instructionPower * sec / instruction
= = Watt / MIPSWatt / MIPS
D = sec / instructionD = sec / instruction= = 1 / MIPS1 / MIPS
E *D ~ Watt / MIPSE *D ~ Watt / MIPS22
0
1
2
3
4
0 1 2 3Vdd (volts)
En
erg
y (P
J)
0
1
Del
ay
100
200
300
400
0 1 2 3Vdd (volts)
E x
D
19
Dr. Avi Mendelson Page 37
Power segmentsPower segmentsOne size does not fit all:One size does not fit all:
Embedded systems Embedded systems –– Most of the power is consumed by the CPUMost of the power is consumed by the CPU–– We are not thermally limited.We are not thermally limited.–– What we really care about is battery life.What we really care about is battery life.–– In real time systems we can take advantage of known In real time systems we can take advantage of known
““deadlinesdeadlines””Laptops (Mobile systems)Laptops (Mobile systems)–– We are thermally limited.We are thermally limited.–– We can not use deadlines (most of the time). We can not use deadlines (most of the time). –– We need to optimize for max battery life and max We need to optimize for max battery life and max
performance in a given power envelop.performance in a given power envelop.Desktops: Desktops: –– We mainly care about power awareness and Thermal We mainly care about power awareness and Thermal
issuesissues
Dr. Avi Mendelson Page 38
Example: Adder DesignsExample: Adder DesignsVarious algorithms exist to implement an integer adderVarious algorithms exist to implement an integer adder–– Ripple, select, skip (x2), LookRipple, select, skip (x2), Look--ahead, conditionalahead, conditional--sum. sum. –– Each with its own characteristics of timing and power consumptioEach with its own characteristics of timing and power consumption.n.
FA
FAFAFAFA
Ripple CarryRipple Carry
FAFAFAFA
Variable/Fixed Width Carry SkipVariable/Fixed Width Carry Skip
FAFAFAFA
Carry LookCarry Look--aheadahead
FAFAFAFA FAFA 0
1
Carry SelectCarry Select
20
Dr. Avi Mendelson Page 39
Energy (pJ)
Delay (nSec)
Ripple Carry 117 54.27Constant Width Carry Skip 109 28.38Variable Width Carry Skip 126 21.84Carry Lookahead 171 17.13Carry Select 216 19.56Conditional Sum 304 20.05
Power and Delay NumbersPower and Delay NumbersAccording to Callaway and According to Callaway and SwartzlanderSwartzlander*:*:
* “Estimating the power consumption of CMOS adders” * “Estimating the power consumption of CMOS adders” -- Callaway, T.K.; Callaway, T.K.; SwartzlanderSwartzlander, E.E., Jr. , E.E., Jr. 11th Symposium on Computer Arithmetic, 1993. Proceedings.11th Symposium on Computer Arithmetic, 1993. Proceedings.
FA
If we must choose one option, asIf we must choose one option, as--is:is:–– If power is the objective If power is the objective –– use use ““constant width carry skipconstant width carry skip””–– If delay is most important If delay is most important –– use use ““carry lookcarry look--aheadahead””
Dr. Avi Mendelson Page 40
Power Complexity MetricsPower Complexity MetricsPower Power αα C VC V2 2 ffMetrics: suppose we introduce new feature that Metrics: suppose we introduce new feature that consumes extra x power and gain y performance:consumes extra x power and gain y performance:1.1. Power/Power/PerfPerf (( Energy), assuming same technology (same Energy), assuming same technology (same
C) and same voltage C) and same voltage »» For battery life, energy bills.For battery life, energy bills.
»» For a given power envelope For a given power envelope –– without voltage scaling. without voltage scaling.
2.2. Power/PerfPower/Perf22 (( Energy*Delay) Energy*Delay) »» Balance performance and power needs.Balance performance and power needs.
3.3. Power/PerfPower/Perf33 (( Energy*DelayEnergy*Delay22))»» For a given power envelope For a given power envelope –– with voltage scaling. with voltage scaling.
assuming that we can (1) trade frequency and voltage scaling, assuming that we can (1) trade frequency and voltage scaling, and (2) we can lower the voltage as much as we wishand (2) we can lower the voltage as much as we wish