reconfigurable computing: boosting software education for

62
[email protected] 28 April 2010 Conference opening keynote IV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 1 Reiner Hartenstein, TU Kaiserslautern, Germany http://hartenstein.de Reconfigurable Computing: boosting Software Education for the Multicore Era. Reiner Hartenstein 1 Why we need to reinvent Computing © 2010, [email protected] http://hartenstein.de TU Kaiserslautern survival problems of our computer-based infrastructure 2 it„s energy consumption may become unaffordable the qualified programmer population required here is not yet existing (here the many-core crisis looks like a minor problem) this requires to reinvent computing this may massively crash the world economy Preface: disaster prevention requires huge efforts

Upload: others

Post on 22-Nov-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 1

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

Reconfigurable Computing: boosting Software Education

for the Multicore Era.Reiner Hartenstein

1

Why we need to reinvent Computing

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern survival problems of our

computer-based infrastructure

2

it„s energy consumption may become unaffordable

the qualified programmer population required here is not yet existing

(here the many-core crisis looks like a minor problem)

this requires to reinvent computing

this may massively crash the world economy

Preface:

disaster prevention requires huge efforts

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 2

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Computers unaffordable:

3

earnbillions

canpay it

or gobancrupt

in Investment Banking?

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternOutline

• Why we need computers

• Energy consumption: unaffordable soon?

• The many-core crisis

• Rescue by Reconfigurable Computing?

• We need to Reinvent Computing

• Conclusions

4

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 3

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternWhy Computers are important

• BANKS

• BUSINESS

• BUSINESS INFORMATION SYSTEMS

5

• BIOLOGY AND MEDICAL SCIENCE

• EDUCATION

• MEDIA, TRAVEL AND TICKETING

• WEATHER PREDICTIONS

• SPORTS

• DAILY LIFE• EMBEDDED

• INTERNET

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternCars made without computers

6

no variations

no extra accessories

only black

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 4

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Robotics ?

7

impossible without computers

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Banking without computers?

8

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 5

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Banking without computers?

9

back to the roaring 20ies ?

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Business Information Systems …

10

Lufthansa anno 1960

… without computers

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 6

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

12

homo Neanderthalensis ?

homo sapiens ?

without

computers ?

© 2010, [email protected] http://hartenstein.de

TU Kaiserslauternthe infrastructure: to keep it intact …

to keep it unbroken,

13

• we need to reinvent computing …• … revolutionize programmer education

Software is an ugly term …

• … it stands for a narrow-mindedCPU-centric world model of CS

we need an approach using less Software

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 7

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternOutline

• Why we need computers

• Energy consumption: unaffordable soon?

• The many-core crisis

• Rescue by Reconfigurable Computing?

• We need to Reinvent Computing

• Conclusions

14

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

green

computing:

Green Computing

Efficiencies from 60-70% to 90% or better, cut energy losses by a factor of 4.

1 example: efficient

power supplies

15

250

watts

http://forum.00de.de/archive/konsolen-und-videospiele/playstation-3-bekommt-import-verbot-fuer-europa-t-37867.html

up to

380

watts

it„s really needed!

… not the silver bullet

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 8

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

16

Don„

don„t need the ps3- have already a grill

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

17

~1999: PCs are coming …

http://www.forbes.com/forbes/1999/0531/6311070a.html

Dig more coal --the PCs are coming

Peter W. Huber, Mark P. Mills,

05.31.99

[1989 from a student at Kaiserslautern]

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 9

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

never run out of energy?

natural gas: similar situation

typical oil field operation

coal

hydronuclear

gas

oil

[Fatih Birol, Chief Economist IEA]. https://www.theoildrum.com/

2007:

80% crude oil coming from decline fields

> 30 %

~ 55 %

Pro

du

ctio

n (

%) 10

0

018

„6 more Saudi Arabias neededfor demand predicted for 2030“

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Oil crisis in 1973weekend ban on driving

19

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 10

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Oil crisis in 1979/80

20

weekend

ban on

driving

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Crude oil $ prices by barrel

21

160

2010

March 14, >82 US-$

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 11

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Power Consumption of Computers

Energy cost may overtake

IT equipment cost in the near future

„we may ultimately need revolutionary new solutions“ [Horst Simon, LBNL, Berkeley]

... has become an industry-wide issue: incremental improvements are on track,

[Albert

Zomaya]

Power consumption by internet: x30 til 2030 if trends continueG. Fettweis, E. Zimmermann: ICT Energy Consumption - Trends and Challenges; WPMC'08, Lapland, Finland, 8 –11 Sep 2008

22

at Dallas

[Randy Katz: IEEE Spectrum, Febr. 2009]

„Google causes 2% of the worlds electricity consumption“

(Google denied)

(~90% payed by customers?)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

vN: a Massive Power Guzzler

23

it‘s a symptom of the von Neumann Syndrome:

Software is extremely power-hungry - by

massively memory-cycle-hungry instruction streams

we need an approach using less Software

Software: has often very bad performance

twin paradigm

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 12

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternOutline

• Why we need computers

• Energy consumption: unaffordable soon?

• The many-core crisis

• Rescue by Reconfigurable Computing?

• We need to Reinvent Computing

• Conclusions

24

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08

109

108

107

106

105

104

103

free ride on

Moore„s Law

the burden of

software performance isthe task of chip designers*

year

*) M-&-C-created

population

Single-core approach:

25

Software Performance

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 13

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The end of the GHz race

26

the end of the

single-core era

… it„s a

power

issue

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08

109

108

107

106

105

104

103

27

the end of the

single-core era

year

The End of single-core

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 14

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

year70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08

1010

1013

1012

1011

relative performance

109

108

107

106

105

104

103

10 12 14 16 18 20 22 24 26 28 30

the end of the

single-core era

Growth beyond Single-core ?

28

we need to learn

parallel programming

„Multicore shifts the burden of

Performance from Chip Designer

to Software Developers.“ [J. Larus: Spending

Moore's Dividend; C_ACM, May 2009]

... performance

drops, productivity &

other problems ...

current SE

population is

not qualified Program

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

29year

relative performance

94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30

Performance Growth by Multicore?

be

gin

of th

e

mu

ltic

ore

e

ra

& massive

programmer

productivity

problems

von-Neumann-only is not the silver bulletReconfigurable Computing is indispensable!

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 15

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternAmid the Clamor

30

Michael Wrinn, (keynote at SIGCSE2010):Suddenly, All Computing Is Parallel:

Seizing Opportunity Amid the Clamor

http://www.sigcse.org/sigcse2010/attendees/keynotes.php

„Foundational change will

disrupt traditional habits

throughout the discipline ....“

„The proud era of

von Neumann architecture

passes into history.“

a senior course

architect in the

Intel Software

College

bring parallel computing

into mainstream of

undergraduate education

our goal:

twin paradigm

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Amid the Clamor ?

31

Michael Wrinn, „Seizing

Opportunity

Amid the Clamor“Michael has the solution ?

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 16

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

32

year

relative performance

94 96 98 00 02 04 06 08 10 12 14 16 18 20 22 24 26 28 30

be

gin

of th

e

mu

ltic

ore

e

ra

Multimedia in the Multicore Era

Multimedia Performance Needs

application performance needs up to:

Audio 800 MIPSGraphics 11 GOPSVideo 160 GOPSDigital TV 900 GOPS

[Pierre Paulin, MPSoC‟09]

needed performance

growing faster than

Moore‘s law

[courtesy E. Sanchez]MIPS

GSM GPRS EDGE UMTS

nextstandard

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

ICT market at an inflection point

33

Prosperity depends on network capacity, ..., efficient pricing, flexible platforms, & ...

Senior Counselor to the U.S. Trade Representative (USTR) on strategy and negotiations.

Broadband is significant at the inflection point, prompting major market governance changes

Cowhey„s & Aronson„s Law

The battle for the living room & mobile is more important than the PC market.

... Cheap Revolution: • affordable broadband •software performance

• low power

twin paradigm

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 17

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternOutline

• Why we need computers

• Energy consumption: unaffordable soon?

• The many-core crisis

• Rescue by Reconfigurable Computing?

• We need to Reinvent Computing

• Conclusions

34

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

35

[Tarek El-Ghazawi et al.: IEEE COMPUTER, Febr. 2008]

Application Speed-upfactor

SavingsPower Cost Size

DNA and Protein sequencing 8723 779 22 253

RC*: Demonstrating the intensive Impact

SGI Altix 4700 with RC 100 RASC compared to Beowulf cluster

Tarek El-Ghazawi

*) RC = Reconfigurable Computing

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 18

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

36

[Tarek El-Ghazawi et al.: IEEE COMPUTER, Febr. 2008]

Application Speed-upfactor

SavingsPower Cost Size

DNA and Protein sequencing 8723 779 22 253

DES breaking 28514 3439 96 1116

much less equipment

needed

much less memory and bandwidth needed massivelysaving energy

RC*: Demonstrating the intensive Impact

SGI Altix 4700 with RC 100 RASC compared to Beowulf cluster

Tarek El-Ghazawi

*) RC = Reconfigurable Computing

no software used !

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

37

FFT100

Reed-Solomon Decoding 2400

Viterbi Decoding400

1000

MAC

DSP and wireless

molecular dynamics simulation

88

BLAST52

protein identification40

Smith-Waterman pattern matching

288

Bioinformatics

GRAPE

20 Astrophysics

SPIHT wavelet-based image compression 457

real-time face detection

6000

video-rate stereo vision

900pattern

recognition730

Image processing,Pattern matching,Multimedia

3000CT imagingcrypto

1000

28500

DES breaking

100

103

106

Spe

edup

-Fac

tor

Speed-up

factors

obtained

by Softwareto Configware

migration

Abundant on-chip bandwidth available for parallelism of flexible granularity (by FPGA).

A physical signal is the simplest and fastest way of message & data transport.

No instruction fetch at runtime:

8723DNA seq.

no software !

http://hartenstein.de

© 2010 [email protected]

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 19

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Energy savingfactors: ~10% of speedup

38

FFT100

Reed-Solomon Decoding 2400

Viterbi Decoding400

1000

MAC

DSP and wireless

molecular dynamics simulation

88

BLAST52

protein identification40

Smith-Waterman pattern matching

288

Bioinformatics

GRAPE

20 Astrophysics

crypto1000

28500DES breaking

100

103

106

Spe

edup

-Fac

tor

http://hartenstein.de

© 2010 [email protected]

Low Power Circuit Design:

PowerOpt™ (ChipVision Design Systems):

divides power consumption by up to 4

GPGPU and x86 multicore:

no energy saving data available

Power save

factors

obtained

SPIHT wavelet-based image compression 457

real-time face detection

6000

video-rate stereo vision

900pattern

recognition730

Image processing,Pattern matching,Multimedia

3000CT imaging

8723DNA seq.

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

... with FPGAs: a much worse technology !massive wiring overhead

+ routing congestion growing with FPGA size

+ massive reconfigurability overhead

main reason: no von Neumann Syndrome!

The „Reconfigurable Computing Paradox“

no software!using Configware and Flowware instead

39

Why such Speed-up Factors ...

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 20

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

RC: speed-up often higher by orders of magnitude

RC: energy-efficiency often higher: very much, or, by orders of magnitude ?

Sure !

Sure !

We need both: Multicore and RC

this is the

silver bullet

„RC“ =

ReconfigurableComputing

40

RC versus Multicore

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternOutline

• Why we need computers

• Energy consumption: unaffordable soon?

• The many-core crisis

• Rescue by Reconfigurable Computing?

• We need to Reinvent Computing

• Conclusions

41

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 21

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Paradigm Dichotomy: an old hat

42RTM as DEC product available: 1973

B0

B1

CONDITION

ENABLE

decision box:

01B0

B1

CO

ND

ITIO

N

ENABLE

demultiplexer:

“That‟s so simple!why did it take

30 years to find out ?”

HDL scene ~1970:

reductionists‟ tunnel view

„decision box turnsinto demultiplexer“

C. G. Bell et al: The Description and Use of Register-Transfer Modules (RTM's); IEEE Trans-C21/5, May 1972

W. A. Clark: Macromodular Computer Systems; 1967 SJCC, AFIPS Conf. Proc.1967:1972:

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

43

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 rDPUs

Coarse-grained Reconfigurable Array

rout thru only

not usedbackbus connect

SNN filter on (supersystolic) KressArray (mainly a pipe network)

reconfigurable Data Path Unit, 32 bits wide

no CPU

rDPU

Result fromconfigware

CoDe-X inside [Jürgen Becker]

by KressArray Xplorer [Ulrich Nageldinger]

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 22

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

44

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 rDPUs

Coarse-grained Reconfigurable Array

rout thru only

not usedbackbus connect

SNN filter on (supersystolic) KressArray (mainly a pipe network)

reconfigurable Data Path Unit, 32 bits wide

no CPU

rDPU

Result fromconfigware

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Brick Wall in the Brain

45

immediately* a VIP jumps up: „But you can„t implement decisions!“

Embarrassing: a top level R&D manager of a global IT corp. group

*) discussion after the talk: RAW at Orlando, FLA

completely missing

sense of Dichotomies

structural procedural

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 23

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

„But you can„t implement decisions!“

46

S = R + (if C then A else B endif);

=1

+

ABR C

section of a very large pipe network: Software to

Configware

Migration:

it„s criminal, that typical CS

graduates don„t know this!

S

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

47

CPU-centric flat world model

sequential-only mind set –

(Aristotelian model)

typical programmer qualification:

This Software-centric world model is obsolete

CPU-“centric“ but no hardware know-howCPU-“centric“ but no hardware know-how

(kind of tunnel view)

CPUnot visible from SE

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 24

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

overhead piles up to code sizes of astronomic dimensions

The von Neumann

Syndrome:C.V. Ramamoorthy

“The Memory Wall”

coined by Sally McKee (& co-author)

Patterson‟s Law:

Dave Patterson

bandwidth gap grows 50% / yearhas reached >1000x

48

Wirth„s

Law“software is slowing faster than hardware is accelerating“

[Niklaus Wirth]

Nathan‟s Law:

Software is a gas. It expands to fill its containers ...

Nathan Myhrvold

… until being limited by Moore’s Law[& Kryder‟s Law]

stands for extremelymemory-cycle-hungry instruction streams„Software“

multiplied x

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

To the honor of John von Neumann

he did not invent the von Neumann machine

49

he has been a reviewer of the project

he and co-authors gave the most concise description of the paradigm‟s principles

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 25

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

50

Max Planck:

Replacement of false doctrines by new insights needs 50 years

waiting for not only old professors but also their scholars to die off.

50 years Software Crisisterm by F. L. Bauer[1968]

Software Engineering critics is not new:

F. L. Bauer 1968, coined the term „Software Crisis“

N. N. 1995: THE STANDISH GROUP REPORT

Robert N. Charette 2005: Why Software Fails; IEEE Spectrum, Sep 2005

Anthony Berglas 2008: Why it is Important that Software Projects FailL. Savain 2006:Why Software is bad

Peter G. Neumann 1985-2003:

216x “Inside Risks“(18 years inside back cover of Comm_ACM)

Parkinson‘s Lawbureaucracy growth independent of actual work to be done

[Cyril Northcote Parkinson, 1955]

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The Machine Model Dichotomy

von Neumann versus Anti-machine (data stream machine).

51

PE

Program Engineering

*) do not confuse with „dataflow“!

FlowwareEngineering

FE

auto-sequencing Memory

asM

SE

SoftwareEngineering

CPU

PE: the Generalization of Software Engineering — First Step

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 26

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The Systolic Array

xxx

xxx

xxx

|

||

x x

x

x

x

x

x x

x

- -

-

input data stream

xx

x

x

x

x

xx

x

--

-

-

-

-

-

-

-

-

-

-

xxx

xxx

xxx

|

|

|

|

|

|

|

|

|

|

|

|output data streams

time

port # (location)

time

time

port # (location) time

port # (location)

defines: ... which data item at which time at which port

(H. T. Kung paradigm)

Algebra experts„ hobby, early 80ies

DPA*(pipe network)*) DataPath Array

(array of DPUs)DataPath Unit hasno program counter!

it‟s no CPU!

nice time/space notation -

52

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

53

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

The Supersystolic Array

rout thru only

not usedbackbus connect

generalizationof thesystolic arraybyRainer Kress:

simulatedannealingreplacesalgebraicsynthesismethods

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 27

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern“It‟s not our job”

xxx

xxx

xxx

|||

xxxx

xx

xxx

- --

xxxx

xx

xxx

---

---

---

---

xxx

xxx

xxx

|||

|||

|||

||

|

resources

sequencer

Machine:

54

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

asM

asM

asM

asM

asM

asM

asM: Auto-Sequencing

Memory

use data counters, no program counterrDPA

x x

x

x

x

x

x x

x

- -

-

xx

x

x

x

x

xx

x

--

-

-

-

-

-

-

-

-

-

-

the Data stream

machine

xxx

xxx

xxx

|

||

xxx

xxx

xxx

|

|

|

|

|

|

|

|

|

|

|

|

implemented

by distributed

on-chip memory

55

asM

asM

asM

asM

asM

asM

programmed .by Flowware

LocalityAwarenessis essential

reconfigurable

address generator

(GAG) inside asM

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 28

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

*Daniel Tabac,Jack Lipovski

Machine twins: different data movement

# moving data between data transport execution

triggered by strategy

1 von Neumann CPU cores

via common memory

instruction stream

moving data at run time

2(r)DPU cores within (r)DPA

piped thru directly from

(r)DPU to (r)DPU

arrival of data(transport-triggered*)

moving at compile time the locality of

execution

Who moves operand to operator if not an instruction?

56

/ from

if not Software?

Twins? Von Neumann vs. datastream machine

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

rDPU

58

transport-

triggered

rDPU

rDPU

rDPU

Pipeline:

rDPU

data stream

source:

I/O or asM

data stream

sink:

I/O or asM

no instructions„read data“

„write data“

rDPU

data stream

source: asM orother rDPU

data stream

sink: asM orother rDPU

read

y

acce

pted

dat

a

asM = auto-sequencing Memory

asM

asM

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 29

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Procedural Languages Twins

systolic Flowware Languages

read next data item

goto (data address)

jump to (data address)

data loop

data loop nesting

data loop escape

data stream branching

yes: internally parallel loops

59

imperative Software Languages

read next instruction

goto (instruction address)

jump to (instruction address)

instruction loop

instruction loop nesting

instruction loop escape

instruction stream branching

no: no internally parallel loops

But there is the Asymmetry

program counter data counter(s)

for data parallelism

super

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

60

Locality awareness is

essential for flowware

How data are movedSoftware: by addresses, read from instruction

Flowware: by wire (configured before run time)

relation to configware calls locality awareness

here locality is less relevant

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 30

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

A Heliocentric CS Model needed

Twin Paradigm Dual Dichotomy Approach.

62

PE

Program Engineering

The Generalization ofSoftware Engineering —

*) do not confuse with „dataflow“!

FlowwareEngineering

FE

auto-sequencing Memory

asMtime to space mapping

issue

CE

ConfigwareEngineering

structures

pipe network model

rDPU reconfigurable-Data-Path- Unit

reconfigurable-Data-Path- ArrayrDPA

SE

SoftwareEngineering

CPU

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Twin Paradigm Compilation

63

source program

softwarecompiler

software code

Software Engineering

Configware Engineering

placement & routing

scheduler

flowware code

data

instruction streams data streamsconfiguration

configwarecode

mapperconfigware

compiler

source „program“

automatic partitioning Code-X mid„

90ies: Jürgen Becker

instruction scheduler

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 31

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternOutline

• Why we need computers

• Energy consumption: unaffordable soon?

• The many-core crisis

• Rescue by Reconfigurable Computing?

• We need to Reinvent Computing

• Conclusions

64

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

traditional qualification in the time domain

65

Education Revolution

+ lean qualification in the space domain

= lean hardware modeling qualification

at a higher level of abstraction

by twin paradigm co-education:

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 32

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternConclusion

Programmer Education Revolution

for using Multicore - and RC* (SERUM-RC*)

*) Reconfigurable Computing

66

We urgently need a Mead-&-Conway-dimension text book on twin-paradigm

programming education

and a few new Matlab/Simulink boxes

We urgently need a

for a model-based lean instruction approach to undergraduate students

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

67

Reinvent? (final remark)

avoid traditional tunnel views

to obtain new perspectives

rediscovery and revival of old ideas

rearrange and teach them properly

to reach promising new horizons

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 33

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

-

thank you

68

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

-

END

69

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 34

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

-

extra pages

for discussion:

70

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

-

Hollerith

71

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 35

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The first el. Computer: Reconfigurable

•prototyped 1884 by Herman Hollerith

•a century before FPGA introduction

•data-stream-based

72

here is the

Look Up Table

here is the

Look Up Table

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The Early LUT

60 years later: RAM available for configuration

non-volatile configuration

“memory”field-programmable:•manually•or, by swapping pre-wired plug boards

73

LUT

CLB

Configurable Logic Block

Look-Up Table

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 36

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Hollerith became IBM

•1896: Herman Hollerith's Tabulating Machine Company organized in Washington, D.C., as world's first electric tabulating and accounting machine company.

• instruction-stream-based

74

• in 1911, Hollerith„s Tabulating Machine Company* was merged with Computing Scale Company of America (CSCA)

•and with the International Time Recording Company (ITR) •to form Computing Tabulating Recording Company (CTR),

• renamed to IBM in 1924.• in the 40ies the

vN model took over

after

swallowing

2 other

firms

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

75

at US state governers„ summit meeting

vN early 40s: ENIAC: for ballistic tables

Software: from the Mainframe

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 37

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternRAM history

late 40s, magnetic core memory

76

1968, Robert Dennard ,single-transistor, dynamic RAM) - end of magnetic cores.

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Teaching for Change: an early martyr

„Turing is irrelevant“

The von Neumann model is the emulation of a tape machine

„The von Neumann syndrome“: coined ~ a decade later

Prof. C.V. Ramamoorthy, (UC Berkeley),

SDPS 2006, San Diego, CA

Brad Cox 1990: Planning the Software Industrial Revolution

Dijkstra 1968: The Goto considered harmful

R.Hartenstein, G. Koch 1975: The universal Bus considered harmful

Backus 1978: Can programming be liberated from the von Neumann style?

Arvind et al., 1983: A critique of Multiprocessing the von Neumann StyleL. Savain 2006:

Why Software is bad …

Peter G. Neumann 1985-2003: 216x “Inside Risks“ (18 years inside back cover

of Comm_ACM)

Critique of von Neumann is not new:

punished for blasphemy?

(mimicking tape on RAM)

Peter G. Neumann

http://www.sigsoft.org/SEN/parnas.html

D. L. Parnas (keynote):

"Teaching for Change“;

10th Conf. Softw. Engineering Education

and Training (CSEET '97)

Teaching for Change

77

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 38

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

To the honor of John von Neumann

he did not invent the von Neumann machine

78

he has been a reviewer of the project

he and co-authors gave the most concise description of the paradigm‟s principles

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Data meeting the Processing Unit (PU)

by Software

byConfigware

routing the data by memory-cycle-hungry instruction streams thru shared memory

data-stream-based: placement* of the execution locality ...

We have 2 choices

pipe network generated by configware compilation

... explaining the RC advantage

*) before run time

(data)

(PU)

79

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 39

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The von Neumann Syndrome

The data-stream-based anti machine approach:

The instruction-stream-based von Neumann approach:

has no von Neumann bottle-necks

the watering pot model [Hartenstein]

has several

von Neumann overhead

phenomena

per CPU!

80

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

-

edu

81

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 40

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

82

John Hennessy:

widespread confusion and competing claims, „I would be

panicked if I were in industry“

e. g. automatically

parallelizingcompilation via

multi-threading, and many other

ad-hoc solutions?

new typesof bugs

introduced

Hastily knitted

compilers for

the heavy

lifting ?

easy fix?easy fix?

Hastily knitted

compilers for

the heavy

lifting ?

new typesof bugs

introduced

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

JPEG zigzag scan pattern

83

x

y

EastScan is step by [1,0]end EastScan;

SouthScan isstep by [0,1]endSouthScan;

*> Declarations

NorthEastScan isloop 8 times until [*,1]step by [1,-1]endloopend NorthEastScan;

SouthWestScan isloop 8 times until [1,*]step by [-1,1]endloopend SouthWestScan;

HalfZigZag isEastScanloop 3 times SouthWestScanSouthScanNorthEastScanEastScanendloopend HalfZigZag;

goto PixMap[1,1]

HalfZigZag;SouthWestScanuturn (reverse (HalfZigZag))

reverse (HalfZigZag)

data counterdata counter

data counterdata counter

2

1

3

4

HalfZigZag

a datastream language example

an animation

MoPLexample

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 41

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Why a new machine

paradigm ???

The anti machine as the 2nd paradigmis the key to curricular innovation

rDPAµprocessor

... a Troyan horse to introduce data-stream-based issues to the classical mind set of programmers

Programming by flowware instead of softwareis very easy to learn

Flowware education: no fully fledged hardwareexpert needed to program configware

(... same language primitives)

84

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternDomains & what we need

85

term source for programming … domain

software instruction streams time(procedural)

configware ressources (structures) space (structural)

flowware data streams time (procedural)

we need data parallelism

we need paradigm twins

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 42

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Our Contemporary Computer Machine Model

Machinemodel

resources sequencer

propertyprogramming

source property programming source state register

ASICaccelerator hardwired - hardwired -

CPU hardwired - programmableSoftware(instruction streams)

program counter

RPUaccelerator programmable

Configware(configuration

code)programmable

Flowware(data

streams)

datacounters

twin Paradigm Dichotomy

in CPU

in RAM

data counters of reconfigurable address generators in asM (auto-sequencing) data memory blocks

the same language primitives!

86

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

-

FPGA

87

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 43

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

-

Machine

twins

88

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Procedural Languages Twins

systolic Flowware Languages

read next data item

goto (data address)

jump to (data address)

data loop

data loop nesting

data loop escape

data stream branching

yes: internally parallel loops

89

imperative Software Languages

read next instruction

goto (instruction address)

jump to (instruction address)

instruction loop

instruction loop nesting

instruction loop escape

instruction stream branching

no: no internally parallel loops

But there is the Asymmetry

program counter data counter(s)

for data parallelism

super

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 44

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

*Daniel Tabac,Jack Lipovski

Machine twins: different data movement

# moving data between data transport execution

triggered by strategy

1 von Neumann CPU cores

via common memory

instruction stream

moving data at run time

2(r)DPU cores within (r)DPA

piped thru directly from

(r)DPU to (r)DPU

arrival of data (transport-triggered*)

moving at compile time the locality of

execution

Who moves operand to operator if not an instruction?

90

/ from

remember the Memory Wall (Patterson„s Law)

if not Software?

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

-

time 2 space

mapping

91

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 45

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

POIIP: Loop turns into Pipeline

92

[1979]

(reconfigurable)

DataPath Unit:

rDPUloop body

rDPU

rDPU

rDPU

Pipeline:

rDPUloop body

loop:

complex loop body

nested loops

complex rDPU or pipe network inside rDPU

complex pipe network

CPU

Memory

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

-

illustration

93

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 46

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The von Neumann Syndrome

The data-stream-based anti machine approach:

The instruction-stream-based von Neumann approach:

has no von Neumann bottle-necks

the watering pot model [Hartenstein]

has several

von Neumann overhead

phenomena

per CPU!

94

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Data meeting the Processing Unit (PU)

by Software

byConfigware

routing the data by memory-cycle-hungry instruction streams thru shared memory

data-stream-based: placement* of the execution locality ...

We have 2 choices

pipe network generated by configware compilation

... explaining the RC advantage

*) before run time

(data)

(PU)

95

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 47

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

-

double

dichotomy

96

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternDouble Dichotomy

Paradigm Dichotomy

Relativity Dichotomy

Proceduretime

(Software-Domain)Structurespace

(Configware-Domain)

instruction streamvon Neumann

(Software-Domain)data streamAnti Machine

(Flowware-Domain)

97

time domain

space domain

time domain

time domain

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 48

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternParadigm Dichotomy (2)

Paradigm Dichotomy

Relativity Dichotomy

Proceduretime

(Software-Domain)Structurespace

(Configware-Domain)

instruction streamvon Neumann

(Software-Domain)data streamAnti Machine

(Flowware-Domain)

98

time domain

space domain

time domain

time domain

software to flowware mapping ?

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternRelativity Dichotomy

Paradigm Dichotomy

Relativity Dichotomy

Proceduretime

(Software-Domain)Structurespace

(Configware-Domain)

instruction streamvon Neumann

(Software-Domain)data streamAnti Machine

(Flowware-Domain)

99

time domain

space domain

time domain

time domain

time to space mapping

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 49

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Relativity Dichotomy (2)

time domain: space domain:

procedure domain structure domain

2 phases:

1) programming instruction streams

2) run time

3 phases:

1) reconfigurationof structures

time space

2) programmingdata streams

3) run time100

time time/spacetime time/space

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

time-iterative to space-iterative

101

a time to

space/time

mapping

loop transformation methodogy: 70ies and later

n*k time steps, 1 CPU

n time steps, k DPUs

the space dimension is limited (e.g. because of the chip size)n time steps,

1 CPU

1 time step, n DPUs

a time to

space

mapping

Strip mining

[D. Loveman, J-ACM, 1977]

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 50

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

POIIP: Loop turns into Pipeline

102

[1979]

(reconfigurable)

DataPath Unit:

rDPUloop body

rDPU

rDPU

rDPU

Pipeline:

rDPUloop body

loop:

complex loop body

nested loops

complex rDPU or pipe network inside rDPU

complex pipe network

CPU

Memory

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternTerminology

termprogram counter

execution triggered by paradigm

CPU

yes instruction fetch

instruction-stream-

based

(r)DPU**no data arrival*

data-stream-basedDPU

program

counter

DPUCPU

*) “transport-triggered”**) does not have a program counter

103

rDPU

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 51

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

von Neumann overhead vs. Reconfigurable Computing

overheadvon Neumann

machinehardwired

anti machinereconfigurableanti machine

instruction fetch instruction stream none*

state address computation instruction stream none*

data address computation instruction stream none*

data meet PU + other overh. instruction stream none*

i / o to / from off-chip RAM instruction stream none*

Inter PU communication instruction stream none*

message passing overhead instruction stream none*

*) configured before run time

104

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

-

coarse

grain

105

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 52

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

106

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

array size: 10 x 16 rDPUs

Coarse-grained Reconfigurable Array

rout thru only

not usedbackbus connect

SNN filter on (supersystolic) KressArray (mainly a pipe network)

reconfigurable Data Path Unit, 32 bits wide

no CPU

rDPU

note: software perspective without instruction streams: pipelining

compiled by Nageldinger„s KressArray Xplorer with Juergen Becker„s CoDe-X inside

4

© 2010, [email protected] http://hartenstein.de

TU KaiserslauternReally so simple ?

107

(recall this example !)

rDPU not used used for routing only operator and routing port location markerLegend: backbus connect

rout thru only

not usedbackbus connect

embarrassing reaction to Ulrich Nageldinger„s talk at RAW 1996

CoDe-X inside [Jürgen Becker]

by KressArray Xplorer [Ulrich Nageldinger]

4

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 53

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Brick Wall in the Brain

108

immediately* a VIP jumps up: „But you can„t implement decisions!“

Embarrassing: a top level R&D manager of a global IT corp. group

*) discussion after the talk: RAW at Orlando, FLA

completely missing

sense of Dichotomies

structural procedural

4

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

„But you can„t implement decisions!“

109

S = R + (if C then A else B endif);

=1

+

ABR C

section of a very large pipe network: Software to

Configware

Migration:

it„s criminal, that typical CS

graduates don„t know this!

illustrating, that mono-rail education is fatal

4

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 54

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Dual paradigm mind set: an old hat

Mapped into a Hardware mind set: action box = Flipflop, decision box = (de)multiplexer

110

Software mind set: instruction-stream-based: flow chart -> control instructions

(mapping from procedural to structural domain)

C. G. Bell et al: The Description and Use of Register-Transfer Modules (RTM's); IEEE Trans-C21/5, May 1972

W. A. Clark: Macromodular Computer Systems; 1967 SJCC, AFIPS Conf. Proc.1967:1972:

FF

token bit

evoke

FF FF

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

-

?

111

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 55

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Old Paradigms and Methodologies

1884: 1st mass-produced electric computer (Hollerith)

1946: von Neumann Machine Paradigm 1980: Datastreams (Kung, Leiserson)1984: 1st FPGA to market (Xilinx)1989: Anti Machine** Paradigm (TU-KL)1990: first rDPA* (Rabaey)1994: higher Anti Machine** Programming Language (Flowware: TU-KL)1995: super systolic array: rDPA (Kress) 1996+: SCCC (LANL), SCORE, ASPRC, Bee (UCB), ...1997+: Discipline of Distributed Memory Architectures (IMEC …)

1997: 1st automatically partitioning Configware/Software Co-Compiler

112

(TU-KL)

*) rDPA = reconfigurableData Path Array

**) datastream machine(flowware machine):

no „dataflow machine“!!

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Loop Transformation Examples

113

loop 1-8bodybodyendloop

loop 1-8bodyendloop

loop 9-16bodyendloop

fork

joinstrip mining

loop 1-4triggerendloop

loop 1-2triggerendloop

loop 1-8triggerendloop

reconf.array:host:loop 1-16bodyendloop

sequential processes: resource parameter drivenCo-Compilation

loop unrolling

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 56

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

114

The impact of shifting to multicore

performance

programmer productivity

program efficiency

power consumption

4 P issues:

market trends

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Credited to be „The father of Reconfigurable Computing“ (also pre-FPGA era) [1],

EU grant (80ies), 85 mio ECU (pre-€): complete EDA framework [4,5] around KARL

1981: visiting professor at UC Berkeley (& coop. w. Xerox PARC)

1983: founder of the German contribution to the Mead-&-Conway VLSI design revolution: the multi university „E.I.S. project“ (gov. grant: 38 million Deutschmark)

IEEE fellow, SDPS fellow, FPL fellow, best paper awards, other awards

Professor (ordinarius emeritus), TU Kaiserslautern

All acad. degrees from KIT Karlsruhe Institute of Technology (his mentor: Karl Steinbuch)

Creator of KARL[2], most successful [3] trailblazer HDL before VHDL came up

[1] qu. Viktor Prasanna (with Gerald Estrin as the grandfather of Reconfigurable Computing, who proposed it in 1960 WJCC)

[4] R. Hartenstein: The History of KARL and ABL; in: J. Mermet (editor): Fundamentals and Standards in Hardware Description Languages; ISBN 0-7923-2513-4, Kluwer (now Springer), September 1993. also see: http://xputers.informatik.uni-kl.de/karl/karl_history_fbi.html

[5] format-checking functional floorplan graphic editor, and textual editors, calculus-based term rewriting floorplan generator, embedded router, automatic test generation, testability analysis, structured logic synthesis, simulator, et al. -- also see [4]

[2] R. Hartenstein: Fundamentals of Structured Hardware Design; American Elsevier, 1977 -- Bestseller

Founder / co-founder of several international annual conference series

[email protected]

115

1977 & later used as a

textbook at UC Berkeley

(not only here)

KARL: a Pascalishhardwarelanguage

[3] for users, usage details, quotations,etc.see: http://www.fpl.uni-kl.de/staff/hartenstein/KARLUsers.html

his hobby: giving keynotes

http://hartenstein.de/keynotes.htm

CV of Reiner Hartenstein

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 57

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Conclusions

additional Flowware / Configware skills are essential qualifications for programmers.

key issues: performance and energy consumption of programs

need to master the hetero of all 3:

Singlecore, Multicore, & Reconfigurable Computing

massive long term

R&D funding required

like known from DARPAhetero tools, environments and lab

courses are a cardinal problem

Mead-&-Conway-style SE Revolution toward twin-paradigm education is urgently needed

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Not without Reconfigurable Computing!

Conclusions (2)

04 06 08 10 12 14 16 18 20 22 24 26 28 30

year

relative performance

117

possible for 2 or 3 more decades?

th

e e

nd

of th

e

sin

gle

core e

ra

To maintain a Booming Multicore Era:

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 58

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Energy Cost of Computing

NY city server farms 1/4 km2 building floor area

118

Amsterdam„s electricity consumption: 25% to server farms

Google, Microsoft ...: huge datacenters at Columbia River and ORNL benefits from Tennessee Valley Authority.

Google„s annual electricity bill: > 50,000,000 $ (in 2005*)

*) when Brent oil price was around 40$

Google: patent for a "water-based data center„ using the ocean to provide power and cooling.

Pelamis Wave Energy Converter

Immense energy consumption of the internet

(2005)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

119

the impact of Reconfigurable Computing and its potential

to rescue us from the coming severe energy crisis

In contrast to traditional computing by software-driven CPUs,

Reconfigurable Computing offers an overwhelming reduction

of electricity consumption, as well as massive speed-up factors:

both by up to several orders of magnitude.

#

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 59

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

120

Just for only the internet an increase by a factor of 30 by the year 2030

has been predicted [107] “if the trend continues”. This means a much

higher electricity consumption than that of the entire world to-day.

This trend must not continue,

since it is unaffordable.

the climate protection scene completely ignores these

highly dramatic electricity consumption predictions..

Only Reconfigurable Computing can avoid, that running

these infrastructures becomes unaffordable in the future.

This very urgent, and we have to complete

our rescue actions much earlier than 2030.

However, to avoid a breakdown of the world

economy we need these cyber infrastructures.

We are not aware of the rapidly growing immense electricity consumption of all

computers, directly visible or embedded in all kinds of devices, appliances,

machines, facilities, complexes, and other computer-based cyber infrastructures

#

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

121

for our modern civilization both are essential survival issues:

Reconfigurable Computing (RC), as well as

the tremendous electricity consumption of computing.

This should find priority attention by the media as soon as possible.

to prepare, to organize, and to implement

the extensive rescue actions needed will

take a lot of time and effort.

For this reason we cannot afford any delay in placing

a widely noticed alarm signal by our mass media.

Reconfigurable Computing: why has it the potential to save us from the future disaster,

and, what problems have to be solved, and, what campaign of actions is needed.

#

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 60

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Parkinson„s Laws & Hack„s Law

122

and spill on the floor leaving a very sticky mess

dataexpands to

fill the spaceavailable for

storage

work time its completionoverflow the

Hack‘s Law

Parkinson‘s Law

(an animation)

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

year

70 72 74 76 78 80 82 84 86 88 90 92 94 96 98 00 02 04 06 08

1010

1013

1012

1011

relative performance

109

108

107

106

105

104

103

10 12 14 16 18 20 22 24 26 28 30

the end of the

single-core era

123

Growth needed beyond Moore„s Law

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 61

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

124

Growing number of wireless ICT features*

beamer interface

video recording

smart scannerspeech recognition

music, TV, foto, radio, remote conferencing,

micro beamer

navigator

text recognition

5 megapixel, zoom, autofocus, face recognition, smart features

wir

ele

ss

boo

ks

creating demand for software performance*) only a few

examples:

portable TV

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

The end of the GHz race

125

the end of the

single-core era

number of transistors

doubles every year

processor cores

18 months

[email protected] April 2010

Conference opening keynoteIV Southern Programmable Logic Conference (SPL 2010), 24-26 March 2010, Porto Galinhas Beach, Ipojuca, Pernambuco, Brasil 62

Reiner Hartenstein, TU Kaiserslautern, Germanyhttp://hartenstein.de

© 2010, [email protected] http://hartenstein.de

TU Kaiserslautern

Simple KressArray Configuration Example

126