low power architectures lecture #1:introductionde/lowpower.pdf · max power (watts) i386 i486...

20
1 Technion, EE department Dr. Avi Mendelson, Intel Low power Architectures Low power Architectures Lecture #1:Introduction Lecture #1:Introduction Dr. Avi Mendelson [email protected] Contributors: Ronny Ronen, Eli Savransky, Shekhar Borkar, Fred Pollack Dr. Avi Mendelson Dr. Avi Mendelson [email protected] [email protected] Contributors: Ronny Ronen, Eli Savransky, Shekhar Borkar, Fred P Contributors: Ronny Ronen, Eli Savransky, Shekhar Borkar, Fred Pollack ollack Dr. Avi Mendelson Page 2 Why Power meter? Why Power meter?

Upload: others

Post on 03-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

1

Technion, EE departmentDr. Avi Mendelson, Intel

Low power ArchitecturesLow power Architectures

Lecture #1:IntroductionLecture #1:IntroductionDr. Avi Mendelson

[email protected]: Ronny Ronen, Eli Savransky, Shekhar Borkar, Fred Pollack

Dr. Avi MendelsonDr. Avi Mendelson

[email protected]@ee.technion.ac.ilContributors: Ronny Ronen, Eli Savransky, Shekhar Borkar, Fred PContributors: Ronny Ronen, Eli Savransky, Shekhar Borkar, Fred Pollackollack

Dr. Avi Mendelson Page 2

Why Power meter?Why Power meter?

Page 2: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

2

Dr. Avi Mendelson Page 3

MooreMoore’’s Laws LawFor many years technology obeys MooreFor many years technology obeys Moore’’s Laws Law

’70 ’73 ’76 ’79 ’82 ’85 ’88 ’91 ’94 '97 2000’70 ’73 ’76 ’79 ’82 ’85 ’88 ’91 ’94 '97 2000

TransistorsTransistorsPer DiePer Die

101088

101077

101066

101055

101044

101033

101022

101011

101000

1K1K4K4K 16K16K

64K64K256K256K

1M1M

16M16M4M4M

64M64M

4004400480808080

808680868028680286 i386™i386™

i486™i486™ PentiumPentium®®

MemoryMemoryMicroprocessorMicroprocessor

Source: Intel Source: Intel

PentiumPentium® ® IIIIII

256M256M

PentiumPentium®® ProProPentiumPentium® ® IIII

Dr. Avi Mendelson Page 4

In the Last 25 Years Life was EasyIn the Last 25 Years Life was Easy(*)(*)

Doubling of transistor density every 30 monthsDoubling of transistor density every 30 monthsIncreasing die sizes, allowed by Increasing die sizes, allowed by –– Increasing Wafer SizeIncreasing Wafer Size–– Process technology moving from Process technology moving from ““black artblack art”” to to ““manufacturing manufacturing

sciencescience””

⇒⇒ Doubling of transistors every 18 monthsDoubling of transistors every 18 months

Tech Old µArch mm (linear) New µArch mm (linear) Ratio Ratio 1.0µ i386C 6.5 i486 11.5 3.1

0.7µ i486C 9.5 Pentium® 17 3.2 0.5µ Pentium® 12.2 Pentium® Pro 17.3 2.1 0.18µ Pentium® III 10.3 Next Gen ? 2--3

Implications: (in the same technology)

1. New µArch ~ 2-3X die area of the last µArch

2. Provides 1.5-1.7X integer performance of the last µArch

(*) source source Fred Pollack, Fred Pollack, MicroMicro--3232

Page 3: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

3

Dr. Avi Mendelson Page 5

Suddenly, the power Suddenly, the power monster appearsmonster appears

Dr. Avi Mendelson Page 6

The power crisis The power crisis –– power consumptionpower consumption

Sourse: cool-chips, Micro 32

Page 4: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

4

Dr. Avi Mendelson Page 7

Processor Power EvolutionProcessor Power Evolution

New generation: always increase powerNew generation: always increase powerCompactions: higher performance at lower powerCompactions: higher performance at lower power

One size fits all: start with high power segment and shrink it tOne size fits all: start with high power segment and shrink it to Mobileo Mobile

Max

Po

wer

(W

atts

)

i386 i386

i486 i486

Pentium® Pentium®

Pentium®w/MMX tech.

Pentium®w/MMX tech.

1

10

100

1.5µ 1µ 0.8µ 0.6µ 0.35µ 0.25µ 0.18µ 0.13µ

Pentium® Pro Pentium® Pro Pentium® II Pentium® II

Pentium® 4Pentium® 4Pentium® 4

??

Pentium® III Pentium® III

Dr. Avi Mendelson Page 8

The power crisis: The power crisis: Power DensityPower DensityA real thread to the Moor lawA real thread to the Moor lawThink of watts/cmThink of watts/cm22

Power is not distributed evenly over the chip. A Power is not distributed evenly over the chip. A failure can happen if a single point reach the max failure can happen if a single point reach the max power point.power point.Complex algorithms lead to denser power:Complex algorithms lead to denser power:–– Dense random logicDense random logic

Timing pressure leads to faster/bigger/powerTiming pressure leads to faster/bigger/power--hungrier gateshungrier gates–– Designers put together units that communicate with each Designers put together units that communicate with each

other. It creates other. It creates ““regionsregions”” with high activity factors with high activity factors --> hot > hot spots.spots.

Page 5: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

5

Dr. Avi Mendelson Page 9

Power DensityPower DensityW

atts

/cm

2

1

10

100

1000

1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ 0.1µ 0.07µ

i386i386i486i486

Pentium® Pentium® Pentium® ProPentium® Pro

Pentium® IIPentium® IIPentium® IIIPentium® IIIHot plateHot plate

Nuclear ReactorNuclear ReactorNuclear Reactor

RocketNozzleRocketRocketNozzleNozzle

* “New Microarchitecture Challenges in the Coming Generations of* “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” CMOS Process Technologies” ––Fred Pollack, Intel Corp. Micro32 conference key note Fred Pollack, Intel Corp. Micro32 conference key note -- 1999.1999.

Pentium® 4Pentium® 4

Dr. Avi Mendelson Page 10

Some implicationsSome implicationsWe canWe can’’t build microprocessors with ever t build microprocessors with ever increasing power density and die sizesincreasing power density and die sizes

The constraint is power The constraint is power –– not manufacturabilitynot manufacturability

The design of The design of any any future microfuture micro--processor shouldprocessor shouldtake power into consideration. We need totake power into consideration. We need todistinguish between different aspects of power:distinguish between different aspects of power:

Power deliveryPower deliveryMax power (TJ) Max power (TJ)

Power density Power density -- hot spots hot spots Energy Energy –– static + dynamicstatic + dynamic

Power and Energy aware design should take carePower and Energy aware design should take careof each of these aspectsof each of these aspects

OneOne-- size does not fit all anymoresize does not fit all anymore

Page 6: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

6

Dr. Avi Mendelson Page 11

Why power and power Why power and power density increase over density increase over

timetime

Dr. Avi Mendelson Page 12

Basic terminologyBasic terminology

Page 7: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

7

Dr. Avi Mendelson Page 13

Power and the digital worldPower and the digital world……Power is consumed:Power is consumed:–– When capacitance is charged and dischargedWhen capacitance is charged and discharged–– A charged cap is a logical A charged cap is a logical ‘‘11’’, a discharged cap is , a discharged cap is ‘‘00’’

The capacitance can be the gates of other transistors or wires The capacitance can be the gates of other transistors or wires (buses and long interconnects)(buses and long interconnects)

IN OUT

00 110011

E=CVE=CV22

Dr. Avi Mendelson Page 14

Power and the digital world (2)Power and the digital world (2)……Secondary effects like leakage and shortSecondary effects like leakage and short--circuit current arecircuit current areincreasing with advanced process technologiesincreasing with advanced process technologies

Leakage is growing dramaticallyLeakage is growing dramatically–– 7% now, expect 20% in next process technology, 50% in next one7% now, expect 20% in next process technology, 50% in next one–– …… Unless we do something (and we will)Unless we do something (and we will)

IN OUT

1/21/2

IN OUT

00 11

LeakageLeakage(sub(sub--threshold)threshold)

ShortShort--circuitcircuit

Page 8: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

8

Dr. Avi Mendelson Page 15

Power & EnergyPower & EnergyEnergyEnergy

““The capacity for doing workThe capacity for doing work”” **Important forImportant for

–– Battery life Battery life -- lower energy per task lower energy per task longer battery lifelonger battery life–– Electric bills Electric bills -- lower energy per task lower energy per task lower billslower bills

Measured over timeMeasured over timeProportional to the overall capacitance and to Proportional to the overall capacitance and to the voltage squared (the voltage squared (CVCV22))

* Merriam* Merriam--Webster’s Collegiate® Dictionary Webster’s Collegiate® Dictionary -- http://www.mhttp://www.m--w.com/w.com/

Dr. Avi Mendelson Page 16

Power & EnergyPower & EnergyPowerPower

WorkWork done per time unitdone per time unit–– Measured in WattsMeasured in Watts

P = P = ααCVCV22ff((αα: activity, C: capacitance, V: voltage, f: frequency): activity, C: capacitance, V: voltage, f: frequency)

““MeasuredMeasured”” at peak timeat peak timeHigher power Higher power higher currenthigher current–– Cannot exceed platform power delivery constrainsCannot exceed platform power delivery constrains

Higher power Higher power higher temperaturehigher temperature–– Cannot exceed the thermal constrainsCannot exceed the thermal constrains

Page 9: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

9

Dr. Avi Mendelson Page 17

Voltage, Power, FrequencyVoltage, Power, FrequencyTransistor switches faster at higher voltageTransistor switches faster at higher voltage

Higher voltage enables higher frequencyHigher voltage enables higher frequencyMaximum frequency grows about linearly with voltageMaximum frequency grows about linearly with voltage……Within a given voltage range Within a given voltage range VminVmin--VmaxVmax–– V < V < VminVmin

transistors wontransistors won’’t switcht switch–– V > V > VmaxVmax

the device may burnthe device may burn

““The cube lawThe cube law””::P = kVP = kV33

(or ~1%V=3%P)(or ~1%V=3%P)

0

100

200

300

400

500

600

700

800

900

1000

0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9

Fequency(Mhz)

Power (mWatt)

* Source: Intel Corp. (http://developer.intel.com)* Source: Intel Corp. (http://developer.intel.com)

XScale processor freq. & power vs. voltage *

Dr. Avi Mendelson Page 18

The scalability theory The scalability theory ––process technologyprocess technology

Target (Target (Ideally):EacheIdeally):Eache generation (2generation (2--3 years)3 years)Reduce gate delay by 30% 50% freq gainReduce gate delay by 30% 50% freq gain2. Increase density by 2x2. Increase density by 2x–– 0.7 shrink on a side, 50% area reduction on 0.7 shrink on a side, 50% area reduction on

compactioncompaction–– Transistor Z and L shrink by 30%Transistor Z and L shrink by 30%–– Interconnect pitches shrink by ~30%Interconnect pitches shrink by ~30%–– Add metal layers to makeAdd metal layers to make--up for (1) pitches < up for (1) pitches <

30%, and (2) RC30%, and (2) RC

Page 10: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

10

Dr. Avi Mendelson Page 19

Scaling theoryScaling theory----1 of 21 of 2

7.0

,7.0

,7.07.0

7.07.0

=⇒==

=×==

CCapTotal

CCapFringing

CCapArea

f

a

Lateral and vertical dimensions reduce 30%

Capacitance--area and fringing--reduce 30%

7.0,7.0,7.0 ===== oxtLLengthWWidth

27.07.07.0 =×=×= YXAreaDieDie area reduces 50%

Dr. Avi Mendelson Page 20

Scaling theoryScaling theory----2 of 22 of 27.0

17.0 ==

TransistorCap

Capacitance per transistor reduces 30%

7.01

7.07.07.0 =

×=

AreaCap

Capacitance per unit area increases 43%

22

2 7.07.0

7.07.0,7.0

7.07.07.0

7.07.0

7.07.0)(,7.0,7.0

=×=××==×=×=

=×=−===

fVCPowerIVddC

T

VVddtW

IVVdd t

ox

t

Delay reduces 30%, power reduces 50%

Page 11: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

11

Dr. Avi Mendelson Page 21

Process Technology Process Technology –– the Enablerthe EnablerEvery process generation (every 2Every process generation (every 2--3 years), 3 years), Ideally:Ideally:–– Shorten gate delay by 30%Shorten gate delay by 30%

~50% (100/70) frequency gain~50% (100/70) frequency gain–– VddVdd scaled down by ~30%scaled down by ~30%

Results:Results:»» 22//33 reduction in energy/transition reduction in energy/transition

(CV(CV22 0.7 x 0.70.7 x 0.722 = 0.34X)= 0.34X)»» 11//22 reduction in power reduction in power

(CV(CV22ff 0.7 x 0.70.7 x 0.72 2 x 1.5 =0.5X)x 1.5 =0.5X)»» Power density unchangedPower density unchanged

Dr. Avi Mendelson Page 22

Ideal Scenarios...Ideal Scenarios...Ideal Ideal ““ShrinkShrink””–– Same Same µµarcharch–– 1X #Xistors1X #Xistors–– 0.5X size0.5X size–– 1.5X frequency1.5X frequency

–– 0.5X power0.5X power–– 1X IPC (1X IPC (instrinstr./cycle)./cycle)–– 1.5X performance1.5X performance–– 1X power density1X power density

Looks good. IsnLooks good. Isn’’t it?t it?

Ideal New Ideal New µµarcharch–– Same die sizeSame die size–– 2X #Xistors2X #Xistors–– 1X size1X size–– 1.5X frequency1.5X frequency

–– 1X power1X power–– 2X IPC2X IPC–– 3X performance3X performance–– 1X power density1X power density

Page 12: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

12

Dr. Avi Mendelson Page 23

Process Technologies Process Technologies –– RealityRealityBut in But in reality:reality:–– New designs squeeze frequency to 2X per processNew designs squeeze frequency to 2X per process–– New designs use more transistors (2XNew designs use more transistors (2X--3X to get 1.5X3X to get 1.5X--1.7X 1.7X perfperf))

So, every new process and architecture generation:So, every new process and architecture generation:–– Power goes up about 2XPower goes up about 2X–– Power density goes up 30%~80%Power density goes up 30%~80%

This is bad, andThis is bad, and……Will get worse in future process generations:Will get worse in future process generations:–– Voltage (Voltage (VddVdd) will scale down less) will scale down less–– Leakage is going to the roofLeakage is going to the roof

Not as good as it first lookedNot as good as it first looked…… Aha?Aha?

Dr. Avi Mendelson Page 24

In PictureIn Picture……Silicon Process TechnologySilicon Process Technology 1.51.5µµ 1.01.0µµ 0.80.8µµ 0.60.6µµ 0.350.35µµ 0.250.25µµ 0.180.18µµ 0.130.13µµ

Intel386Intel386™™ DX DX ProcessorProcessor

Intel486Intel486™™ DX DX ProcessorProcessor

PentiumPentium®®ProcessorProcessor

PentiumPentium®® Pro Pro ProcessorProcessor

PentiumPentium®® II II ProcessorProcessor

PentiumPentium®® 4 4 ProcessorProcessor

PentiumPentium®® III III ProcessorProcessor

Page 13: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

13

Dr. Avi Mendelson Page 25

Performance Efficiency of Performance Efficiency of µµarchitecturesarchitectures

ImplicationsImplications: (in the same technology): (in the same technology)

1. New 1. New µµarch ~2arch ~2--3X die area of the last 3X die area of the last µµarcharch

2. Provides 1.42. Provides 1.4--1.8X integer performance of 1.8X integer performance of the last the last µµarcharch

We are on the Wrong Side of a Square LawWe are on the Wrong Side of a Square Law

OldOld mmmm NewNew mmmm AreaArea PerfPerfTechTech µµArchArch (linear)(linear) µµArchArch (linear)(linear) RatioRatio RatioRatio1.01.0µµ i386Ci386C 6.56.5 i486i486 11.511.5 3.13.10.70.7µµ i486Ci486C 9.59.5 PentiumPentium®® procproc 1717 3.23.2 1.81.80.50.5µµ PentiumPentium®® procproc 12.212.2 Pentium ProPentium Pro®® procproc 17.317.3 2.12.1 1.51.50.180.18µµ Pentium IIIPentium III®® procproc 10.310.3 PentiumPentium®® 4 proc4 proc 14.714.7 22 1.41.4

0

0.5

1

1.5

2

2.5

3

3.5

486=>PP PP=>Ppro PIII=>P4P

Area

Perf.

Dr. Avi Mendelson Page 26

Power Evolution Power Evolution (Theoretical)(Theoretical)

For a 15mm/side die (225mmFor a 15mm/side die (225mm22))Assume 2X frequency increase each generationAssume 2X frequency increase each generationFuture process numbers are estimatedFuture process numbers are estimated

00

5050

100100

150150

200200

250250

0.25µ0.25µ 0.18µ0.18µ 0.13µ0.13µ 0.1µ0.1µ

Wat

tsW

atts

00

2525

5050

7575

100100

Po

wer

Den

sity

(W

/cm

Po

wer

Den

sity

(W

/cm

22 ))Leakage PowerLeakage PowerActive PowerActive PowerPower DensityPower Density

Page 14: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

14

Dr. Avi Mendelson Page 27

More aspects of power More aspects of power aware architecturesaware architectures

Dr. Avi Mendelson Page 28

The The ““Power BottleneckPower Bottleneck”” (Hot Spots)(Hot Spots)The Thermal StoryThe Thermal Story

Silicon is not a good heat conductorSilicon is not a good heat conductorWith high power density, cannot assume power uniformityWith high power density, cannot assume power uniformity–– High temperature High temperature high leakage high leakage high power high power higher temperaturehigher temperature

Artificially expanding the die size not help. Must attack the hoArtificially expanding the die size not help. Must attack the hot spotst spotsSmart layout that separates the hot units increases the processoSmart layout that separates the hot units increases the processorr’’s s power envelop!power envelop!

0

50

100

150

200

250

Hea

t F

lux

2)

40

50

60

70

80

90

100

110

Tem

per

atu

re

Power MapPower Map OnOn--Die TemperatureDie Temperature

* “New Microarchitecture Challenges in the Coming Generations of* “New Microarchitecture Challenges in the Coming Generations of CMOS Process Technologies” CMOS Process Technologies” ––Fred Pollack, Intel Corp. Micro32 conference key note Fred Pollack, Intel Corp. Micro32 conference key note -- 1999.1999.

Page 15: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

15

Dr. Avi Mendelson Page 29

The Mythical Power EnvelopThe Mythical Power EnvelopCPU CPU ““power enveloppower envelop””–– Maximum power that commercial cooling technologies can Maximum power that commercial cooling technologies can

dissipatedissipate

Limited byLimited by–– Total system powerTotal system power–– Processor power. Typical figures:Processor power. Typical figures:

Server <130W, Desktop 50Server <130W, Desktop 50--80W, Notebook 2080W, Notebook 20--30W, sub30W, sub--notebook <10Wnotebook <10W

Bigger systems cool better and dissipate more powerBigger systems cool better and dissipate more power–– Heat syncsHeat syncs–– Heat pipesHeat pipes–– Better TIM (Thermal Interface Materials)Better TIM (Thermal Interface Materials)

Average power density matters:Average power density matters:–– Uniformly distributed power allows for higher CPU dissipationUniformly distributed power allows for higher CPU dissipation

Dr. Avi Mendelson Page 30

Energy EfficiencyEnergy EfficiencyEnergy per taskEnergy per task–– Proportional to # of processed Proportional to # of processed instsinsts. per task. per task–– Proportional to the average work consumed per instructionProportional to the average work consumed per instruction–– Deteriorates as speculation increases and complexity growsDeteriorates as speculation increases and complexity growsOr Formally, per a given task, Or Formally, per a given task, –– Energy per retired instruction is: Energy per retired instruction is: ββ*W*W, , wherewhere

»» ββ: Ratio of : Ratio of TotalTotal to to RetiredRetired number of processed instructionsnumber of processed instructions»» W: Average energy spent in processing an instructionW: Average energy spent in processing an instructionBoth figures grow with every new microBoth figures grow with every new micro--architecturearchitecture

In that respect:In that respect:high performance modern microhigh performance modern micro--architectures architectures are less energyare less energy--efficientefficientLuckily, process technology offsets that by Luckily, process technology offsets that by reducing energy per switchreducing energy per switch

Page 16: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

16

Dr. Avi Mendelson Page 31

Voltage ScalingVoltage ScalingWithin a given voltage range, higher voltage allows higher freq.Within a given voltage range, higher voltage allows higher freq.

Used for trading power and frequency. EitherUsed for trading power and frequency. Either–– Statically, at manufacturing timeStatically, at manufacturing time

–– Dynamically, at run time (e.g., IntelDynamically, at run time (e.g., Intel’’s s SpeedStepSpeedStep®® TechnologyTechnology

Actual range depends on specificActual range depends on specificdesign and process technologydesign and process technologyExamples*:Examples*:–– IntelIntel®® XScaleXScale™™ processors runsprocessors runs

from from 0.75V0.75V (150MHz/50mW)(150MHz/50mW)to to 1.65V1.65V (800MHz/900mW)(800MHz/900mW)

–– Intel mobile PentiumIntel mobile Pentium®® III processorIII processorsells from sells from 1.1V1.1V (600MHz)(600MHz)to to 1.7V1.7V (1GHz)(1GHz)

0

100

200

300

400

500

600

700

800

900

1000

0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9

Fequency(Mhz)

Power (mWatt)

* Source: Intel Corp. (http://developer.intel.com)* Source: Intel Corp. (http://developer.intel.com)

XScale proc. freq & power vs voltage

Dr. Avi Mendelson Page 32

Alpha hot spotsAlpha hot spots

Source - CoolChips-99

Area 30%

Freq. 50%

Power 67%

Page 17: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

17

Dr. Avi Mendelson Page 33

Voltage Scaling (cont.)Voltage Scaling (cont.)Huge effect on Power:Huge effect on Power:20% freq reduction 20% freq reduction 20% voltage reduction20% voltage reduction

35% energy reduction. 35% energy reduction. ((ααCVCV22 = = ααCC*0.8*0.822 = = ααCC*0.64)*0.64)

50% power reduction. 50% power reduction. ((ααCVCV22f = f = ααCC*0.8*0.833 == ααCC*0.51)*0.51)

Even more impressive if we recall:Even more impressive if we recall:–– 20% freq hit 20% freq hit only 10%only 10%--15% performance hit*15% performance hit*

Voltage scaling can be used to trade Voltage scaling can be used to trade performance with power!performance with power!

* Depends mainly on core to bus frequency ratio and caches size.* Depends mainly on core to bus frequency ratio and caches size.

Dr. Avi Mendelson Page 34

What is the impact of the What is the impact of the computer architecture?computer architecture?

Page 18: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

18

Dr. Avi Mendelson Page 35

Power and performancePower and performance----tradetrade--offoff

Voltage scaling alone is not enough to cap power

So far, the analysis indicates that:

You will have to tradeoff performance for power– Reduce die area => Reduce Active C => Reduce Perf(C) & Power(C)– Reduce Vdd and freq => Perf(freq) & Power(freq,Vdd2)--cubic– Reduce dies size, Vdd, and freq

Argument:

Tradeoff performance for power does Goal #1 make sense?

Set a goal that comprehends both: performance and power.

Use energy delay product (E*D) to evaluate tradeoffs

Dr. Avi Mendelson Page 36

E*D product (lower is better) may provide E*D product (lower is better) may provide a better criteriaa better criteria

E = energy / instructionE = energy / instruction= = Power * sec / instructionPower * sec / instruction

= = Watt / MIPSWatt / MIPS

D = sec / instructionD = sec / instruction= = 1 / MIPS1 / MIPS

E *D ~ Watt / MIPSE *D ~ Watt / MIPS22

0

1

2

3

4

0 1 2 3Vdd (volts)

En

erg

y (P

J)

0

1

Del

ay

100

200

300

400

0 1 2 3Vdd (volts)

E x

D

Page 19: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

19

Dr. Avi Mendelson Page 37

Power segmentsPower segmentsOne size does not fit all:One size does not fit all:

Embedded systems Embedded systems –– Most of the power is consumed by the CPUMost of the power is consumed by the CPU–– We are not thermally limited.We are not thermally limited.–– What we really care about is battery life.What we really care about is battery life.–– In real time systems we can take advantage of known In real time systems we can take advantage of known

““deadlinesdeadlines””Laptops (Mobile systems)Laptops (Mobile systems)–– We are thermally limited.We are thermally limited.–– We can not use deadlines (most of the time). We can not use deadlines (most of the time). –– We need to optimize for max battery life and max We need to optimize for max battery life and max

performance in a given power envelop.performance in a given power envelop.Desktops: Desktops: –– We mainly care about power awareness and Thermal We mainly care about power awareness and Thermal

issuesissues

Dr. Avi Mendelson Page 38

Example: Adder DesignsExample: Adder DesignsVarious algorithms exist to implement an integer adderVarious algorithms exist to implement an integer adder–– Ripple, select, skip (x2), LookRipple, select, skip (x2), Look--ahead, conditionalahead, conditional--sum. sum. –– Each with its own characteristics of timing and power consumptioEach with its own characteristics of timing and power consumption.n.

FA

FAFAFAFA

Ripple CarryRipple Carry

FAFAFAFA

Variable/Fixed Width Carry SkipVariable/Fixed Width Carry Skip

FAFAFAFA

Carry LookCarry Look--aheadahead

FAFAFAFA FAFA 0

1

Carry SelectCarry Select

Page 20: Low power Architectures Lecture #1:Introductionde/lowpower.pdf · Max Power (Watts) i386 i486 Pentium® ... Watts/cm 2 1 10 100 1000 1.5µ 1µ 0.7µ 0.5µ 0.35µ 0.25µ 0.18µ 0.13µ

20

Dr. Avi Mendelson Page 39

Energy (pJ)

Delay (nSec)

Ripple Carry 117 54.27Constant Width Carry Skip 109 28.38Variable Width Carry Skip 126 21.84Carry Lookahead 171 17.13Carry Select 216 19.56Conditional Sum 304 20.05

Power and Delay NumbersPower and Delay NumbersAccording to Callaway and According to Callaway and SwartzlanderSwartzlander*:*:

* “Estimating the power consumption of CMOS adders” * “Estimating the power consumption of CMOS adders” -- Callaway, T.K.; Callaway, T.K.; SwartzlanderSwartzlander, E.E., Jr. , E.E., Jr. 11th Symposium on Computer Arithmetic, 1993. Proceedings.11th Symposium on Computer Arithmetic, 1993. Proceedings.

FA

If we must choose one option, asIf we must choose one option, as--is:is:–– If power is the objective If power is the objective –– use use ““constant width carry skipconstant width carry skip””–– If delay is most important If delay is most important –– use use ““carry lookcarry look--aheadahead””

Dr. Avi Mendelson Page 40

Power Complexity MetricsPower Complexity MetricsPower Power αα C VC V2 2 ffMetrics: suppose we introduce new feature that Metrics: suppose we introduce new feature that consumes extra x power and gain y performance:consumes extra x power and gain y performance:1.1. Power/Power/PerfPerf (( Energy), assuming same technology (same Energy), assuming same technology (same

C) and same voltage C) and same voltage »» For battery life, energy bills.For battery life, energy bills.

»» For a given power envelope For a given power envelope –– without voltage scaling. without voltage scaling.

2.2. Power/PerfPower/Perf22 (( Energy*Delay) Energy*Delay) »» Balance performance and power needs.Balance performance and power needs.

3.3. Power/PerfPower/Perf33 (( Energy*DelayEnergy*Delay22))»» For a given power envelope For a given power envelope –– with voltage scaling. with voltage scaling.

assuming that we can (1) trade frequency and voltage scaling, assuming that we can (1) trade frequency and voltage scaling, and (2) we can lower the voltage as much as we wishand (2) we can lower the voltage as much as we wish