[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 1
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
Reconfigurable Computing: boosting Software Education
for the Multicore Era.Reiner Hartenstein
1
Why we need to reinvent Computing
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern survival problems of our
computer-based infrastructure
2
it„s energy consumption may become unaffordable
the qualified programmer population required here is not yet existing
(here the many-core crisis looks like a minor problem)
this requires to reinvent computing
this may massively crash the world economy
Preface:
disaster prevention requires huge efforts
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 2
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Computers unaffordable:
3
earnbillions
canpay it
or gobancrupt
in Investment Banking?
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternOutline
• Why we need computers
• Energy consumption: unaffordable soon?
• The many-core crisis
• Rescue by Reconfigurable Computing?
• We need to Reinvent Computing
• Conclusions
4
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 3
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternWhy Computers are important
• BANKS
• BUSINESS
• BUSINESS INFORMATION SYSTEMS
5
• BIOLOGY AND MEDICAL SCIENCE
• EDUCATION
• MEDIA, TRAVEL AND TICKETING
• WEATHER PREDICTIONS
• SPORTS
• DAILY LIFE• EMBEDDED
• INTERNET
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternCars made without computers
6
no variations
no extra accessories
only black
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 4
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Robotics ?
7
impossible without computers
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Banking without computers?
8
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 5
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Banking without computers?
9
back to the roaring 20ies ?
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Business Information Systems …
10
Lufthansa anno 1960
… without computers
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 6
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
12
homo Neanderthalensis ?
homo sapiens ?
without
computers ?
© 2010, [email protected] http://hartenstein.de
TU Kaiserslauternthe infrastructure: to keep it intact …
to keep it unbroken,
13
• we need to reinvent computing …• … revolutionize programmer education
Software is an ugly term …
• … it stands for a narrow-mindedCPU-centric world model of CS
we need an approach using less Software
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 7
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternOutline
• Why we need computers
• Energy consumption: unaffordable soon?
• The many-core crisis
• Rescue by Reconfigurable Computing?
• We need to Reinvent Computing
• Conclusions
14
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
green
computing:
Green Computing
Efficiencies from 60-70% to 90% or better, cut energy losses by a factor of 4.
1 example: efficient
power supplies
15
250
watts
http://forum.00de.de/archive/konsolen-und-videospiele/playstation-3-bekommt-import-verbot-fuer-europa-t-37867.html
up to
380
watts
it„s really needed!
… not the silver bullet
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 8
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
16
Don„
don„t need the ps3- have already a grill
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
17
~1999: PCs are coming …
http://www.forbes.com/forbes/1999/0531/6311070a.html
Dig more coal --the PCs are coming
Peter W. Huber, Mark P. Mills,
05.31.99
[1989 from a student at Kaiserslautern]
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 9
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
never run out of energy?
natural gas: similar situation
typical oil field operation
coal
hydronuclear
gas
oil
[Fatih Birol, Chief Economist IEA]. https://www.theoildrum.com/
2007:
80% crude oil coming from decline fields
> 30 %
~ 55 %
Pro
du
ctio
n (
%) 10
0
018
„6 more Saudi Arabias neededfor demand predicted for 2030“
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Oil crisis in 1973weekend ban on driving
19
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 10
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Oil crisis in 1979/80
20
weekend
ban on
driving
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Crude oil $ prices by barrel
21
160
2010
March 14, >82 US-$
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 11
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Power Consumption of Computers
Energy cost may overtake
IT equipment cost in the near future
„we may ultimately need revolutionary new solutions“ [Horst Simon, LBNL, Berkeley]
... has become an industry-wide issue: incremental improvements are on track,
[Albert
Zomaya]
Power consumption by internet: x30 til 2030 if trends continueG. Fettweis, E. Zimmermann: ICT Energy Consumption - Trends and Challenges; WPMC'08, Lapland, Finland, 8 –11 Sep 2008
22
at Dallas
[Randy Katz: IEEE Spectrum, Febr. 2009]
„Google causes 2% of the worlds electricity consumption“
(Google denied)
(~90% payed by customers?)
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
vN: a Massive Power Guzzler
23
it‘s a symptom of the von Neumann Syndrome:
Software is extremely power-hungry - by
massively memory-cycle-hungry instruction streams
we need an approach using less Software
Software: has often very bad performance
twin paradigm
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 12
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternOutline
• Why we need computers
• Energy consumption: unaffordable soon?
• The many-core crisis
• Rescue by Reconfigurable Computing?
• We need to Reinvent Computing
• Conclusions
24
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08
109
108
107
106
105
104
103
free ride on
Moore„s Law
the burden of
software performance isthe task of chip designers*
year
*) M-&-C-created
population
Single-core approach:
25
Software Performance
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 13
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
The end of the GHz race
26
the end of the
single-core era
… it„s a
power
issue
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08
109
108
107
106
105
104
103
27
the end of the
single-core era
year
The End of single-core
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 14
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
year70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08
1010
1013
1012
1011
relative performance
109
108
107
106
105
104
103
10 12 14 16 18 20 22 24 26 28 30
the end of the
single-core era
Growth beyond Single-core ?
28
we need to learn
parallel programming
„Multicore shifts the burden of
Performance from Chip Designer
to Software Developers.“ [J. Larus: Spending
Moore's Dividend; C_ACM, May 2009]
... performance
drops, productivity &
other problems ...
current SE
population is
not qualified Program
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
29year
relative performance
94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30
Performance Growth by Multicore?
be
gin
of th
e
mu
ltic
ore
e
ra
& massive
programmer
productivity
problems
von-Neumann-only is not the silver bulletReconfigurable Computing is indispensable!
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 15
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternAmid the Clamor
30
Michael Wrinn, (keynote at SIGCSE2010):Suddenly, All Computing Is Parallel:
Seizing Opportunity Amid the Clamor
http://www.sigcse.org/sigcse2010/attendees/keynotes.php
„Foundational change will
disrupt traditional habits
throughout the discipline ....“
„The proud era of
von Neumann architecture
passes into history.“
a senior course
architect in the
Intel Software
College
bring parallel computing
into mainstream of
undergraduate education
our goal:
twin paradigm
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Amid the Clamor ?
31
Michael Wrinn, „Seizing
Opportunity
Amid the Clamor“Michael has the solution ?
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 16
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
32
year
relative performance
94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30
be
gin
of th
e
mu
ltic
ore
e
ra
Multimedia in the Multicore Era
Multimedia Performance Needs
application performance needs up to:
Audio 800 MIPSGraphics 11 GOPSVideo 160 GOPSDigital TV 900 GOPS
[Pierre Paulin, MPSoC‟09]
needed performance
growing faster than
Moore‘s law
[courtesy E. Sanchez]MIPS
GSM GPRS EDGE UMTS
nextstandard
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
ICT market at an inflection point
33
Prosperity depends on network capacity, ..., efficient pricing, flexible platforms, & ...
Senior Counselor to the U.S. Trade Representative (USTR) on strategy and negotiations.
Broadband is significant at the inflection point, prompting major market governance changes
Cowhey„s & Aronson„s Law
The battle for the living room & mobile is more important than the PC market.
... Cheap Revolution: • affordable broadband •software performance
• low power
twin paradigm
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 17
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternOutline
• Why we need computers
• Energy consumption: unaffordable soon?
• The many-core crisis
• Rescue by Reconfigurable Computing?
• We need to Reinvent Computing
• Conclusions
34
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
35
[Tarek El-Ghazawi et al.: IEEE COMPUTER, Febr. 2008]
Application Speed-upfactor
SavingsPower Cost Size
DNA and Protein sequencing 8723 779 22 253
RC*: Demonstrating the intensive Impact
SGI Altix 4700 with RC 100 RASC compared to Beowulf cluster
Tarek El-Ghazawi
*) RC = Reconfigurable Computing
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 18
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
36
[Tarek El-Ghazawi et al.: IEEE COMPUTER, Febr. 2008]
Application Speed-upfactor
SavingsPower Cost Size
DNA and Protein sequencing 8723 779 22 253
DES breaking 28514 3439 96 1116
much less equipment
needed
much less memory and bandwidth needed massivelysaving energy
RC*: Demonstrating the intensive Impact
SGI Altix 4700 with RC 100 RASC compared to Beowulf cluster
Tarek El-Ghazawi
*) RC = Reconfigurable Computing
no software used !
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
37
FFT100
Reed-Solomon Decoding 2400
Viterbi Decoding400
1000
MAC
DSP and wireless
molecular dynamics simulation
88
BLAST52
protein identification40
Smith-Waterman pattern matching
288
Bioinformatics
GRAPE
20 Astrophysics
SPIHT wavelet-based image compression 457
real-time face detection
6000
video-rate stereo vision
900pattern
recognition730
Image processing,Pattern matching,Multimedia
3000CT imagingcrypto
1000
28500
DES breaking
100
103
106
Spe
edup
-Fac
tor
Speed-up
factors
obtained
by Softwareto Configware
migration
Abundant on-chip bandwidth available for parallelism of flexible granularity (by FPGA).
A physical signal is the simplest and fastest way of message & data transport.
No instruction fetch at runtime:
8723DNA seq.
no software !
http://hartenstein.de
© 2010 [email protected]
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 19
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Energy savingfactors: ~10% of speedup
38
FFT100
Reed-Solomon Decoding 2400
Viterbi Decoding400
1000
MAC
DSP and wireless
molecular dynamics simulation
88
BLAST52
protein identification40
Smith-Waterman pattern matching
288
Bioinformatics
GRAPE
20 Astrophysics
crypto1000
28500DES breaking
100
103
106
Spe
edup
-Fac
tor
http://hartenstein.de
© 2010 [email protected]
Low Power Circuit Design:
PowerOpt™ (ChipVision Design Systems):
divides power consumption by up to 4
GPGPU and x86 multicore:
no energy saving data available
Power save
factors
obtained
SPIHT wavelet-based image compression 457
real-time face detection
6000
video-rate stereo vision
900pattern
recognition730
Image processing,Pattern matching,Multimedia
3000CT imaging
8723DNA seq.
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
... with FPGAs: a much worse technology !massive wiring overhead
+ routing congestion growing with FPGA size
+ massive reconfigurability overhead
main reason: no von Neumann Syndrome!
The „Reconfigurable Computing Paradox“
no software!using Configware and Flowware instead
39
Why such Speed-up Factors ...
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 20
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
RC: speed-up often higher by orders of magnitude
RC: energy-efficiency often higher: very much, or, by orders of magnitude ?
Sure !
Sure !
We need both: Multicore and RC
this is the
silver bullet
„RC“ =
ReconfigurableComputing
40
RC versus Multicore
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternOutline
• Why we need computers
• Energy consumption: unaffordable soon?
• The many-core crisis
• Rescue by Reconfigurable Computing?
• We need to Reinvent Computing
• Conclusions
41
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 21
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Paradigm Dichotomy: an old hat
42RTM as DEC product available: 1973
B0
B1
CONDITION
ENABLE
decision box:
01B0
B1
CO
ND
ITIO
N
ENABLE
demultiplexer:
“That‟s so simple!why did it take
30 years to find out ?”
HDL scene ~1970:
reductionists‟ tunnel view
„decision box turnsinto demultiplexer“
C. G. Bell et al: The Description and Use of Register-Transfer Modules (RTM's); IEEE Trans-C21/5, May 1972
W. A. Clark: Macromodular Computer Systems; 1967 SJCC, AFIPS Conf. Proc.1967:1972:
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
43
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
array size: 10 x 16 rDPUs
Coarse-grained Reconfigurable Array
rout thru only
not usedbackbus connect
SNN filter on (supersystolic) KressArray (mainly a pipe network)
reconfigurable Data Path Unit, 32 bits wide
no CPU
rDPU
Result fromconfigware
CoDe-X inside [Jürgen Becker]
by KressArray Xplorer [Ulrich Nageldinger]
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 22
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
44
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
array size: 10 x 16 rDPUs
Coarse-grained Reconfigurable Array
rout thru only
not usedbackbus connect
SNN filter on (supersystolic) KressArray (mainly a pipe network)
reconfigurable Data Path Unit, 32 bits wide
no CPU
rDPU
Result fromconfigware
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Brick Wall in the Brain
45
immediately* a VIP jumps up: „But you can„t implement decisions!“
Embarrassing: a top level R&D manager of a global IT corp. group
*) discussion after the talk: RAW at Orlando, FLA
completely missing
sense of Dichotomies
structural procedural
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 23
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
„But you can„t implement decisions!“
46
S = R + (if C then A else B endif);
=1
+
ABR C
section of a very large pipe network: Software to
Configware
Migration:
it„s criminal, that typical CS
graduates don„t know this!
S
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
47
CPU-centric flat world model
sequential-only mind set –
(Aristotelian model)
typical programmer qualification:
This Software-centric world model is obsolete
CPU-“centric“ but no hardware know-howCPU-“centric“ but no hardware know-how
(kind of tunnel view)
CPUnot visible from SE
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 24
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
overhead piles up to code sizes of astronomic dimensions
The von Neumann
Syndrome:C.V. Ramamoorthy
“The Memory Wall”
coined by Sally McKee (& co-author)
Patterson‟s Law:
Dave Patterson
bandwidth gap grows 50% / yearhas reached >1000x
48
Wirth„s
Law“software is slowing faster than hardware is accelerating“
[Niklaus Wirth]
Nathan‟s Law:
Software is a gas. It expands to fill its containers ...
Nathan Myhrvold
… until being limited by Moore’s Law[& Kryder‟s Law]
stands for extremelymemory-cycle-hungry instruction streams„Software“
multiplied x
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
To the honor of John von Neumann
he did not invent the von Neumann machine
49
he has been a reviewer of the project
he and co-authors gave the most concise description of the paradigm‟s principles
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 25
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
50
Max Planck:
Replacement of false doctrines by new insights needs 50 years
waiting for not only old professors but also their scholars to die off.
50 years Software Crisisterm by F. L. Bauer[1968]
Software Engineering critics is not new:
F. L. Bauer 1968, coined the term „Software Crisis“
N. N. 1995: THE STANDISH GROUP REPORT
Robert N. Charette 2005: Why Software Fails; IEEE Spectrum, Sep 2005
Anthony Berglas 2008: Why it is Important that Software Projects FailL. Savain 2006:Why Software is bad
Peter G. Neumann 1985-2003:
216x “Inside Risks“(18 years inside back cover of Comm_ACM)
Parkinson‘s Lawbureaucracy growth independent of actual work to be done
[Cyril Northcote Parkinson, 1955]
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
The Machine Model Dichotomy
von Neumann versus Anti-machine (data stream machine).
51
PE
Program Engineering
*) do not confuse with „dataflow“!
FlowwareEngineering
FE
auto-sequencing Memory
asM
SE
SoftwareEngineering
CPU
PE: the Generalization of Software Engineering — First Step
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 26
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
The Systolic Array
xxx
xxx
xxx
|
||
x x
x
x
x
x
x x
x
- -
-
input data stream
xx
x
x
x
x
xx
x
--
-
-
-
-
-
-
-
-
-
-
xxx
xxx
xxx
|
|
|
|
|
|
|
|
|
|
|
|output data streams
time
port # (location)
time
time
port # (location) time
port # (location)
defines: ... which data item at which time at which port
(H. T. Kung paradigm)
Algebra experts„ hobby, early 80ies
DPA*(pipe network)*) DataPath Array
(array of DPUs)DataPath Unit hasno program counter!
it‟s no CPU!
nice time/space notation -
52
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
53
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
The Supersystolic Array
rout thru only
not usedbackbus connect
generalizationof thesystolic arraybyRainer Kress:
simulatedannealingreplacesalgebraicsynthesismethods
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 27
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern“It‟s not our job”
xxx
xxx
xxx
|||
xxxx
xx
xxx
- --
xxxx
xx
xxx
---
---
---
---
xxx
xxx
xxx
|||
|||
|||
||
|
resources
sequencer
Machine:
54
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
asM
asM
asM
asM
asM
asM
asM: Auto-Sequencing
Memory
use data counters, no program counterrDPA
x x
x
x
x
x
x x
x
- -
-
xx
x
x
x
x
xx
x
--
-
-
-
-
-
-
-
-
-
-
the Data stream
machine
xxx
xxx
xxx
|
||
xxx
xxx
xxx
|
|
|
|
|
|
|
|
|
|
|
|
implemented
by distributed
on-chip memory
55
asM
asM
asM
asM
asM
asM
programmed .by Flowware
LocalityAwarenessis essential
reconfigurable
address generator
(GAG) inside asM
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 28
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
*Daniel Tabac,Jack Lipovski
Machine twins: different data movement
# moving data between data transport execution
triggered by strategy
1 von Neumann CPU cores
via common memory
instruction stream
moving data at run time
2(r)DPU cores within (r)DPA
piped thru directly from
(r)DPU to (r)DPU
arrival of data(transport-triggered*)
moving at compile time the locality of
execution
Who moves operand to operator if not an instruction?
56
/ from
if not Software?
Twins? Von Neumann vs. datastream machine
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
rDPU
58
transport-
triggered
rDPU
rDPU
rDPU
Pipeline:
rDPU
data stream
source:
I/O or asM
data stream
sink:
I/O or asM
no instructions„read data“
„write data“
rDPU
data stream
source: asM orother rDPU
data stream
sink: asM orother rDPU
read
y
acce
pted
dat
a
asM = auto-sequencing Memory
asM
asM
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 29
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Procedural Languages Twins
systolic Flowware Languages
read next data item
goto (data address)
jump to (data address)
data loop
data loop nesting
data loop escape
data stream branching
yes: internally parallel loops
59
imperative Software Languages
read next instruction
goto (instruction address)
jump to (instruction address)
instruction loop
instruction loop nesting
instruction loop escape
instruction stream branching
no: no internally parallel loops
But there is the Asymmetry
program counter data counter(s)
for data parallelism
super
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
60
Locality awareness is
essential for flowware
How data are movedSoftware: by addresses, read from instruction
Flowware: by wire (configured before run time)
relation to configware calls locality awareness
here locality is less relevant
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 30
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
A Heliocentric CS Model needed
Twin Paradigm Dual Dichotomy Approach.
62
PE
Program Engineering
The Generalization ofSoftware Engineering —
*) do not confuse with „dataflow“!
FlowwareEngineering
FE
auto-sequencing Memory
asMtime to space mapping
issue
CE
ConfigwareEngineering
structures
pipe network model
rDPU reconfigurable-Data-Path- Unit
reconfigurable-Data-Path- ArrayrDPA
SE
SoftwareEngineering
CPU
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Twin Paradigm Compilation
63
source program
softwarecompiler
software code
Software Engineering
Configware Engineering
placement & routing
scheduler
flowware code
data
instruction streams data streamsconfiguration
configwarecode
mapperconfigware
compiler
source „program“
automatic partitioning Code-X mid„
90ies: Jürgen Becker
instruction scheduler
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 31
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternOutline
• Why we need computers
• Energy consumption: unaffordable soon?
• The many-core crisis
• Rescue by Reconfigurable Computing?
• We need to Reinvent Computing
• Conclusions
64
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
traditional qualification in the time domain
65
Education Revolution
+ lean qualification in the space domain
= lean hardware modeling qualification
at a higher level of abstraction
by twin paradigm co-education:
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 32
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternConclusion
Programmer Education Revolution
for using Multicore - and RC* (SERUM-RC*)
*) Reconfigurable Computing
66
We urgently need a Mead-&-Conway-dimension text book on twin-paradigm
programming education
and a few new Matlab/Simulink boxes
We urgently need a
for a model-based lean instruction approach to undergraduate students
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
67
Reinvent? (final remark)
avoid traditional tunnel views
to obtain new perspectives
rediscovery and revival of old ideas
rearrange and teach them properly
to reach promising new horizons
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 33
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
-
thank you
68
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
-
END
69
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 34
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
-
extra pages
for discussion:
70
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
-
Hollerith
71
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 35
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
The first el. Computer: Reconfigurable
•prototyped 1884 by Herman Hollerith
•a century before FPGA introduction
•data-stream-based
72
here is the
Look Up Table
here is the
Look Up Table
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
The Early LUT
60 years later: RAM available for configuration
non-volatile configuration
“memory”field-programmable:•manually•or, by swapping pre-wired plug boards
73
LUT
CLB
Configurable Logic Block
Look-Up Table
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 36
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Hollerith became IBM
•1896: Herman Hollerith's Tabulating Machine Company organized in Washington, D.C., as world's first electric tabulating and accounting machine company.
• instruction-stream-based
74
• in 1911, Hollerith„s Tabulating Machine Company* was merged with Computing Scale Company of America (CSCA)
•and with the International Time Recording Company (ITR) •to form Computing Tabulating Recording Company (CTR),
• renamed to IBM in 1924.• in the 40ies the
vN model took over
after
swallowing
2 other
firms
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
75
at US state governers„ summit meeting
vN early 40s: ENIAC: for ballistic tables
Software: from the Mainframe
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 37
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternRAM history
late 40s, magnetic core memory
76
1968, Robert Dennard ,single-transistor, dynamic RAM) - end of magnetic cores.
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Teaching for Change: an early martyr
„Turing is irrelevant“
The von Neumann model is the emulation of a tape machine
„The von Neumann syndrome“: coined ~ a decade later
Prof. C.V. Ramamoorthy, (UC Berkeley),
SDPS 2006, San Diego, CA
Brad Cox 1990: Planning the Software Industrial Revolution
Dijkstra 1968: The Goto considered harmful
R.Hartenstein, G. Koch 1975: The universal Bus considered harmful
Backus 1978: Can programming be liberated from the von Neumann style?
Arvind et al., 1983: A critique of Multiprocessing the von Neumann StyleL. Savain 2006:
Why Software is bad …
Peter G. Neumann 1985-2003: 216x “Inside Risks“ (18 years inside back cover
of Comm_ACM)
Critique of von Neumann is not new:
punished for blasphemy?
(mimicking tape on RAM)
Peter G. Neumann
http://www.sigsoft.org/SEN/parnas.html
D. L. Parnas (keynote):
"Teaching for Change“;
10th Conf. Softw. Engineering Education
and Training (CSEET '97)
Teaching for Change
77
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 38
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
To the honor of John von Neumann
he did not invent the von Neumann machine
78
he has been a reviewer of the project
he and co-authors gave the most concise description of the paradigm‟s principles
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Data meeting the Processing Unit (PU)
by Software
byConfigware
routing the data by memory-cycle-hungry instruction streams thru shared memory
data-stream-based: placement* of the execution locality ...
We have 2 choices
pipe network generated by configware compilation
... explaining the RC advantage
*) before run time
(data)
(PU)
79
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 39
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
The von Neumann Syndrome
The data-stream-based anti machine approach:
The instruction-stream-based von Neumann approach:
has no von Neumann bottle-necks
the watering pot model [Hartenstein]
has several
von Neumann overhead
phenomena
per CPU!
80
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
-
edu
81
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 40
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
82
John Hennessy:
widespread confusion and competing claims, „I would be
panicked if I were in industry“
e. g. automatically
parallelizingcompilation via
multi-threading, and many other
ad-hoc solutions?
new typesof bugs
introduced
Hastily knitted
compilers for
the heavy
lifting ?
easy fix?easy fix?
Hastily knitted
compilers for
the heavy
lifting ?
new typesof bugs
introduced
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
JPEG zigzag scan pattern
83
x
y
EastScan is step by [1,0]end EastScan;
SouthScan isstep by [0,1]endSouthScan;
*> Declarations
NorthEastScan isloop 8 times until [*,1]step by [1,-1]endloopend NorthEastScan;
SouthWestScan isloop 8 times until [1,*]step by [-1,1]endloopend SouthWestScan;
HalfZigZag isEastScanloop 3 times SouthWestScanSouthScanNorthEastScanEastScanendloopend HalfZigZag;
goto PixMap[1,1]
HalfZigZag;SouthWestScanuturn (reverse (HalfZigZag))
reverse (HalfZigZag)
data counterdata counter
data counterdata counter
2
1
3
4
HalfZigZag
a datastream language example
an animation
MoPLexample
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 41
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Why a new machine
paradigm ???
The anti machine as the 2nd paradigmis the key to curricular innovation
rDPAµprocessor
... a Troyan horse to introduce data-stream-based issues to the classical mind set of programmers
Programming by flowware instead of softwareis very easy to learn
Flowware education: no fully fledged hardwareexpert needed to program configware
(... same language primitives)
84
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternDomains & what we need
85
term source for programming … domain
software instruction streams time(procedural)
configware ressources (structures) space (structural)
flowware data streams time (procedural)
we need data parallelism
we need paradigm twins
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 42
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Our Contemporary Computer Machine Model
Machinemodel
resources sequencer
propertyprogramming
source property programming source state register
ASICaccelerator hardwired - hardwired -
CPU hardwired - programmableSoftware(instruction streams)
program counter
RPUaccelerator programmable
Configware(configuration
code)programmable
Flowware(data
streams)
datacounters
twin Paradigm Dichotomy
in CPU
in RAM
data counters of reconfigurable address generators in asM (auto-sequencing) data memory blocks
the same language primitives!
86
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
-
FPGA
87
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 43
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
-
Machine
twins
88
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Procedural Languages Twins
systolic Flowware Languages
read next data item
goto (data address)
jump to (data address)
data loop
data loop nesting
data loop escape
data stream branching
yes: internally parallel loops
89
imperative Software Languages
read next instruction
goto (instruction address)
jump to (instruction address)
instruction loop
instruction loop nesting
instruction loop escape
instruction stream branching
no: no internally parallel loops
But there is the Asymmetry
program counter data counter(s)
for data parallelism
super
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 44
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
*Daniel Tabac,Jack Lipovski
Machine twins: different data movement
# moving data between data transport execution
triggered by strategy
1 von Neumann CPU cores
via common memory
instruction stream
moving data at run time
2(r)DPU cores within (r)DPA
piped thru directly from
(r)DPU to (r)DPU
arrival of data (transport-triggered*)
moving at compile time the locality of
execution
Who moves operand to operator if not an instruction?
90
/ from
remember the Memory Wall (Patterson„s Law)
if not Software?
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
-
time 2 space
mapping
91
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 45
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
POIIP: Loop turns into Pipeline
92
[1979]
(reconfigurable)
DataPath Unit:
rDPUloop body
rDPU
rDPU
rDPU
Pipeline:
rDPUloop body
loop:
complex loop body
nested loops
complex rDPU or pipe network inside rDPU
complex pipe network
CPU
Memory
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
-
illustration
93
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 46
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
The von Neumann Syndrome
The data-stream-based anti machine approach:
The instruction-stream-based von Neumann approach:
has no von Neumann bottle-necks
the watering pot model [Hartenstein]
has several
von Neumann overhead
phenomena
per CPU!
94
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Data meeting the Processing Unit (PU)
by Software
byConfigware
routing the data by memory-cycle-hungry instruction streams thru shared memory
data-stream-based: placement* of the execution locality ...
We have 2 choices
pipe network generated by configware compilation
... explaining the RC advantage
*) before run time
(data)
(PU)
95
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 47
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
-
double
dichotomy
96
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternDouble Dichotomy
Paradigm Dichotomy
Relativity Dichotomy
Proceduretime
(Software-Domain)Structurespace
(Configware-Domain)
instruction streamvon Neumann
(Software-Domain)data streamAnti Machine
(Flowware-Domain)
97
time domain
space domain
time domain
time domain
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 48
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternParadigm Dichotomy (2)
Paradigm Dichotomy
Relativity Dichotomy
Proceduretime
(Software-Domain)Structurespace
(Configware-Domain)
instruction streamvon Neumann
(Software-Domain)data streamAnti Machine
(Flowware-Domain)
98
time domain
space domain
time domain
time domain
software to flowware mapping ?
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternRelativity Dichotomy
Paradigm Dichotomy
Relativity Dichotomy
Proceduretime
(Software-Domain)Structurespace
(Configware-Domain)
instruction streamvon Neumann
(Software-Domain)data streamAnti Machine
(Flowware-Domain)
99
time domain
space domain
time domain
time domain
time to space mapping
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 49
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Relativity Dichotomy (2)
time domain: space domain:
procedure domain structure domain
2 phases:
1) programming instruction streams
2) run time
3 phases:
1) reconfigurationof structures
time space
2) programmingdata streams
3) run time100
time time/spacetime time/space
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
time-iterative to space-iterative
101
a time to
space/time
mapping
loop transformation methodogy: 70ies and later
n*k time steps, 1 CPU
n time steps, k DPUs
the space dimension is limited (e.g. because of the chip size)n time steps,
1 CPU
1 time step, n DPUs
a time to
space
mapping
Strip mining
[D. Loveman, J-ACM, 1977]
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 50
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
POIIP: Loop turns into Pipeline
102
[1979]
(reconfigurable)
DataPath Unit:
rDPUloop body
rDPU
rDPU
rDPU
Pipeline:
rDPUloop body
loop:
complex loop body
nested loops
complex rDPU or pipe network inside rDPU
complex pipe network
CPU
Memory
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternTerminology
termprogram counter
execution triggered by paradigm
CPU
yes instruction fetch
instruction-stream-
based
(r)DPU**no data arrival*
data-stream-basedDPU
program
counter
DPUCPU
*) “transport-triggered”**) does not have a program counter
103
rDPU
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 51
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
von Neumann overhead vs. Reconfigurable Computing
overheadvon Neumann
machinehardwired
anti machinereconfigurableanti machine
instruction fetch instruction stream none*
state address computation instruction stream none*
data address computation instruction stream none*
data meet PU + other overh. instruction stream none*
i / o to / from off-chip RAM instruction stream none*
Inter PU communication instruction stream none*
message passing overhead instruction stream none*
*) configured before run time
104
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
-
coarse
grain
105
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 52
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
106
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
array size: 10 x 16 rDPUs
Coarse-grained Reconfigurable Array
rout thru only
not usedbackbus connect
SNN filter on (supersystolic) KressArray (mainly a pipe network)
reconfigurable Data Path Unit, 32 bits wide
no CPU
rDPU
note: software perspective without instruction streams: pipelining
compiled by Nageldinger„s KressArray Xplorer with Juergen Becker„s CoDe-X inside
4
© 2010, [email protected] http://hartenstein.de
TU KaiserslauternReally so simple ?
107
(recall this example !)
rDPU not used used for routing only operator and routing port location markerLegend: backbus connect
rout thru only
not usedbackbus connect
embarrassing reaction to Ulrich Nageldinger„s talk at RAW 1996
CoDe-X inside [Jürgen Becker]
by KressArray Xplorer [Ulrich Nageldinger]
4
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 53
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Brick Wall in the Brain
108
immediately* a VIP jumps up: „But you can„t implement decisions!“
Embarrassing: a top level R&D manager of a global IT corp. group
*) discussion after the talk: RAW at Orlando, FLA
completely missing
sense of Dichotomies
structural procedural
4
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
„But you can„t implement decisions!“
109
S = R + (if C then A else B endif);
=1
+
ABR C
section of a very large pipe network: Software to
Configware
Migration:
it„s criminal, that typical CS
graduates don„t know this!
illustrating, that mono-rail education is fatal
4
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 54
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Dual paradigm mind set: an old hat
Mapped into a Hardware mind set: action box = Flipflop, decision box = (de)multiplexer
110
Software mind set: instruction-stream-based: flow chart -> control instructions
(mapping from procedural to structural domain)
C. G. Bell et al: The Description and Use of Register-Transfer Modules (RTM's); IEEE Trans-C21/5, May 1972
W. A. Clark: Macromodular Computer Systems; 1967 SJCC, AFIPS Conf. Proc.1967:1972:
FF
token bit
evoke
FF FF
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
-
?
111
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 55
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Old Paradigms and Methodologies
1884: 1st mass-produced electric computer (Hollerith)
1946: von Neumann Machine Paradigm 1980: Datastreams (Kung, Leiserson)1984: 1st FPGA to market (Xilinx)1989: Anti Machine** Paradigm (TU-KL)1990: first rDPA* (Rabaey)1994: higher Anti Machine** Programming Language (Flowware: TU-KL)1995: super systolic array: rDPA (Kress) 1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...1997+: Discipline of Distributed Memory Architectures (IMEC …)
1997: 1st automatically partitioning Configware/Software Co-Compiler
112
(TU-KL)
*) rDPA = reconfigurableData Path Array
**) datastream machine(flowware machine):
no „dataflow machine“!!
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Loop Transformation Examples
113
loop 1-8bodybodyendloop
loop 1-8bodyendloop
loop 9-16bodyendloop
fork
joinstrip mining
loop 1-4triggerendloop
loop 1-2triggerendloop
loop 1-8triggerendloop
reconf.array:host:loop 1-16bodyendloop
sequential processes: resource parameter drivenCo-Compilation
loop unrolling
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 56
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
114
The impact of shifting to multicore
performance
programmer productivity
program efficiency
power consumption
4 P issues:
market trends
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Credited to be „The father of Reconfigurable Computing“ (also pre-FPGA era) [1],
EU grant (80ies), 85 mio ECU (pre-€): complete EDA framework [4,5] around KARL
1981: visiting professor at UC Berkeley (& coop. w. Xerox PARC)
1983: founder of the German contribution to the Mead-&-Conway VLSI design revolution: the multi university „E.I.S. project“ (gov. grant: 38 million Deutschmark)
IEEE fellow, SDPS fellow, FPL fellow, best paper awards, other awards
Professor (ordinarius emeritus), TU Kaiserslautern
All acad. degrees from KIT Karlsruhe Institute of Technology (his mentor: Karl Steinbuch)
Creator of KARL[2], most successful [3] trailblazer HDL before VHDL came up
[1] qu. Viktor Prasanna (with Gerald Estrin as the grandfather of Reconfigurable Computing, who proposed it in 1960 WJCC)
[4] R. Hartenstein: The History of KARL and ABL; in: J. Mermet (editor): Fundamentals and Standards in Hardware Description Languages; ISBN 0-7923-2513-4, Kluwer (now Springer), September 1993. also see: http://xputers.informatik.uni-kl.de/karl/karl_history_fbi.html
[5] format-checking functional floorplan graphic editor, and textual editors, calculus-based term rewriting floorplan generator, embedded router, automatic test generation, testability analysis, structured logic synthesis, simulator, et al. -- also see [4]
[2] R. Hartenstein: Fundamentals of Structured Hardware Design; American Elsevier, 1977 -- Bestseller
Founder / co-founder of several international annual conference series
115
1977 & later used as a
textbook at UC Berkeley
(not only here)
KARL: a Pascalishhardwarelanguage
[3] for users, usage details, quotations,etc.see: http://www.fpl.uni-kl.de/staff/hartenstein/KARLUsers.html
his hobby: giving keynotes
http://hartenstein.de/keynotes.htm
CV of Reiner Hartenstein
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 57
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Conclusions
additional Flowware / Configware skills are essential qualifications for programmers.
key issues: performance and energy consumption of programs
need to master the hetero of all 3:
Singlecore, Multicore, & Reconfigurable Computing
massive long term
R&D funding required
like known from DARPAhetero tools, environments and lab
courses are a cardinal problem
Mead-&-Conway-style SE Revolution toward twin-paradigm education is urgently needed
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Not without Reconfigurable Computing!
Conclusions (2)
04 06 08 10 12 14 16 18 20 22 24 26 28 30
year
relative performance
117
possible for 2 or 3 more decades?
th
e e
nd
of th
e
sin
gle
core e
ra
To maintain a Booming Multicore Era:
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 58
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Energy Cost of Computing
NY city server farms 1/4 km2 building floor area
118
Amsterdam„s electricity consumption: 25% to server farms
Google, Microsoft ...: huge datacenters at Columbia River and ORNL benefits from Tennessee Valley Authority.
Google„s annual electricity bill: > 50,000,000 $ (in 2005*)
*) when Brent oil price was around 40$
Google: patent for a "water-based data center„ using the ocean to provide power and cooling.
Pelamis Wave Energy Converter
Immense energy consumption of the internet
(2005)
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
119
the impact of Reconfigurable Computing and its potential
to rescue us from the coming severe energy crisis
In contrast to traditional computing by software-driven CPUs,
Reconfigurable Computing offers an overwhelming reduction
of electricity consumption, as well as massive speed-up factors:
both by up to several orders of magnitude.
#
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 59
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
120
Just for only the internet an increase by a factor of 30 by the year 2030
has been predicted [107] “if the trend continues”. This means a much
higher electricity consumption than that of the entire world to-day.
This trend must not continue,
since it is unaffordable.
the climate protection scene completely ignores these
highly dramatic electricity consumption predictions..
Only Reconfigurable Computing can avoid, that running
these infrastructures becomes unaffordable in the future.
This very urgent, and we have to complete
our rescue actions much earlier than 2030.
However, to avoid a breakdown of the world
economy we need these cyber infrastructures.
We are not aware of the rapidly growing immense electricity consumption of all
computers, directly visible or embedded in all kinds of devices, appliances,
machines, facilities, complexes, and other computer-based cyber infrastructures
#
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
121
for our modern civilization both are essential survival issues:
Reconfigurable Computing (RC), as well as
the tremendous electricity consumption of computing.
This should find priority attention by the media as soon as possible.
to prepare, to organize, and to implement
the extensive rescue actions needed will
take a lot of time and effort.
For this reason we cannot afford any delay in placing
a widely noticed alarm signal by our mass media.
Reconfigurable Computing: why has it the potential to save us from the future disaster,
and, what problems have to be solved, and, what campaign of actions is needed.
#
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 60
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Parkinson„s Laws & Hack„s Law
122
and spill on the floor leaving a very sticky mess
dataexpands to
fill the spaceavailable for
storage
work time its completionoverflow the
Hack‘s Law
Parkinson‘s Law
(an animation)
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
year
70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08
1010
1013
1012
1011
relative performance
109
108
107
106
105
104
103
10 12 14 16 18 20 22 24 26 28 30
the end of the
single-core era
123
Growth needed beyond Moore„s Law
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 61
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
124
Growing number of wireless ICT features*
beamer interface
video recording
smart scannerspeech recognition
music, TV, foto, radio, remote conferencing,
micro beamer
navigator
text recognition
5 megapixel, zoom, autofocus, face recognition, smart features
wir
ele
ss
boo
ks
creating demand for software performance*) only a few
examples:
portable TV
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
The end of the GHz race
125
the end of the
single-core era
number of transistors
doubles every year
processor cores
18 months
[email protected] April 2010
Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 62
Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de
© 2010, [email protected] http://hartenstein.de
TU Kaiserslautern
Simple KressArray Configuration Example
126