Download - Athapascan-1 Interface C++ de programmation parallèle Équipe APACHE LMC/ID - IMAG Jean-Guillaume Dumas, Nicolas Maillard Jean-Louis Roch, Thierry Gautier

Athapascan-1

Interface C++ de

programmation parallèle

Équipe APACHELMC/ID - IMAG

Jean-Guillaume Dumas, Nicolas MaillardJean-Louis Roch, Thierry GautierMathias Doreille, François Galilée, Gerson Cavalheiro

Caractéristique du modèle de programmation(coté utilisateur du modèle)

Granularité explicite – grain de donnée = objet partagé

Shared< int > n ; … Shared< array > tab ;

– grain de calcul = tâche (appel de procédure) Fork fn( n) ;

Parallélisme entre les tâches implicite Indication des actions des tâches sur les objets partagéspar typage explicite { lecture r, écriture w, modification r_w, accumulation

cw}

void fn ( Shared_r < int > a ) { … }

Sémantique naturelle, de type séquentielle – toute lecture d’une donnée voit la dernière écriture dans l’exécution séquentielle

Programmation en Athapascan-1

Pré-requis : un peu de C++

Beaucoup d’Athapascan-1– Interface : Fork et Shared– un exemple de programme– accumulation, accès différé, ordre– format, compilation, exécution

Pré-requis : un peu de C++– Pointeur int* Référence int& Attribut const

– Classe : constructeur (de recopie), destructeurclass cplx { private: double re, im ; public: cplx(double x, double y){re = x; im = y; } // Constructeur cplx(const cplx& z) { re = z.re ; im = z.im; } // Cstor recopie double reel() { return re ; } // méthode ~cplx() { … } // Destructeur };

– Flots : opérateurs << et >> ostream& operator<< (ostream& o, const cplx& z) { cout << z.reel() << ‘’ + i. ‘’ << z.imag() ; }

– Fonction-classe : struct module1 { double operator() ( const double x, double y){return x.reel()+x.imag(); }

cplx a; cout << module1()( a ) ;

Athapascan-1 : librairie C++

• Fork < f > ; exécution « parallèle » de la fonction f

• struct Hello { void operator()( int i ) { cout << « ( Hello » << i << « ) » ;} };

• void main() { …. Hello() ( 0 ) ; for (int i=1 ; i < 10; i++ ) a1::Fork<Hello>() ( i ) ; // création de tâche

• Sortie possible :( Hello 0 ) (Hello 3) ( Hello 1) (Hello (Hello 7 (Hello …

Que peut prendre une tâche en paramètre ?• Tout type qui se comporte comme un int

– constructeur vide : T() { … }– constructeur de recopie : recopie physique– destructeur

• Opérateurs d’emballage/déballage (distribué)– a1::ostream& operator<<( a1_ostream& out, const cplx& x) {

out << x.re << x.im ; } ;– A1::istream& operator>>( a1_istream& in, cplx& x) {

in >> x.re >> x.im ; } ;

• Passage par valeur ou par référence

Passage par référence : shared

• Objet partagé : Shared<T> x

• Déclaration avec initialisation : a1::Shared<int> x ( new int(1) ) ;

• Exemple : a1::Fork<facto>() ( x, 3 ) ; a1::Fork<print>() (x ) ;

• Les 2 tâches accèdent le même objet x : -> il faut contrôler les dépendances

Dépendance : typage des accès• lecture=R : pas possible de modifier l ’objet :

accès à la valeur de x : x.read()struct print { void operator()(Shared_r<int> a) { cout << a.read(); }};

• écriture=W : pas possible de ;}}; lire l ’objet : affectation de x : x.write( T* val )struct facto { void operator()(Shared_w<int> r, int n) { int x = n; for( int i=n ; i ; x*= --i) ; r.write ( new int( x ) ) ;}};

• lecture-écriture=R_W : accès en modification accès en maj de x : x.access()struct incr { void operator()(Shared_r_w<int> a) { a.access() += 1 ;}};

• lecture=R : pas possible de modifier l ’objet : accès à la valeur de x : x.read()struct print { void operator()(Shared_r<int> a) { cout << a.read(); }};

• écriture=W : pas possible de ;}}; lire l ’objet : affectation de x : x.write( T* val )struct facto { void operator()(Shared_w<int> r, int n) { int x = n; for( int i=n ; i ; x*= --i) ; r.write ( x ) ;}};

• lecture-écriture=R_W : accès en modification accès en maj de x : x.access()struct incr { void operator()(Shared_r_w<int> a) { a.access() += 1 ;}};

Sémantique séquentielle

• Toute lecture voit la dernière écriture selon l’ordre séquentiel

• shared<T> A[n][n], B[n][n], C[n][n] ;… // Initialisations A, B et Cfor (int i=0; i<N; i++) for (int j=0; j<N; j++) for (int k=0; k<N; k++) Fork< axpy_in > () ( C[i][j], A[i][k], B[k][j] ) ;

• struct axpy_in { void operator() (Shared_rw<T> c, Shared_r<T> a, Shared_r<T> b ) { c.access() += a.read() * b.read() ; } };


Beaucoup d’Athapascan-1– Interface : Fork et Shared

– un exemple de programme

– accumulation, accès différé, ordre

– format, compilation, exécution

struct fib { void operator()( int n, Shared_w< int > r ) {

if( n<2 ) r.write(new int(n));

else {

Shared< int > x, y;

Fork< fib >() ( n-1, x );

Fork< fib >() ( n-2, y );

Fork< sum >() ( x, y, r );

}

}

Ex : Fibonacci récursif

F(0)

F(1)

F(1)

F(2)

F(3)

+

+

01

1F(0)F(1)

F(2)

F(4)

+

+

01

struct sum {void operator() ( Shared_r< int > a,

Shared_r< int > b, Shared_w< int > c ) { c.write( new int( a.read() + b.read() ) ;

}

F(0) = 0, F(1) = 1F(n) = F(n-1) + F(n-2)

Analyse dynamique du flot de données fib(3)

F(0) = 0, F(1) = 1F(n) = F(n-1) + F(n-2)

r

fib(3)fib(3)fib(3)fib(3)

sum

r/x r/y

fib(2) fib(1)Terminé

Prêt

Attente

Exécution

Shared<int> x, y;

Fork<fib>()( n-1, x );

Fork<fib>()( n-2, y );

Fork<sum>()( x, y, r );


F(0) = 0, F(1) = 1F(n) = F(n-1) + F(n-2)

fib(2) fib(1)

sum

r/x r/y

r

fib(1)

1

fib(2)fib(2) sum

r/x r/y

fib(1) fib(0)

Terminé

Prêt

Attente

Exécution

1

sum

r/x r/y

fib(1) fib(0)

sum

r/x

r

fib(1) fib(0)

1 0

sumsum


Terminé

Prêt

Attente

Exécution

F(0) = 0, F(1) = 1F(n) = F(n-1) + F(n-2)

sum

1

sumsumsum

2

1 0

11

fib(3) = 2

gestion du graphe contrôle de la sémantique [Chap. 3, Prop. 4]





– format, compilation, exécution

Ecriture concurrente - Accumulation

• CW : Concurrent write : Possibilité d’accumuler en concurrence à partir d’une valeur initiale

• Typage accès : Shared_cw<fn_cumul, T > xAccumulation : x.cumul( val ) ;

• struct fn_cumul { void operator()(T& res, const T& s) {

res += s ; // accumulation de s sur res

• Accumulations avec même fonction :concurrentes

-> ordre quelconque (commutatif + associatif)

Accumulation: exemple

• shared<T> A[n][n], B[n][n], C[n][n] ;… // Initialisations A, B et Cfor (int i=0; i<N; i++) … for j … for k ... Fork< axpy_in > () ( C[i][j], A[i][k], B[k][j] ) ;

• struct add void operator()(T& res, const T& s) { res += s ; }

• struct axpy_in { void operator() (Shared_cw<add,T> c, Shared_r<T> a, Shared_r<T> b ) { c.cumul( a.read()*b.read()) ; } };

Shared<T> : déclaration et passage

• Déclaration : Shared<T> x ( val_init ) ;

• Passage en paramètre : typage du droit d’accès

– droit d’accès avec accès autorisé : _r _w _r_w _cw < F_cumul, >

– droit d’accès avec accès différé : _rp _wp _rp_wp _cwp < F_cumul, >

• la tâche ne peut que transmettre le droit d’accès (via Fork)

• mais ne peut pas accéder la valeur

Restrictions

• Pas d’effets de bord sur des objets partagés : Shared variables globales : a1_global_tsp<F,T1,T2>

• … x … ; Fork<F>() ( x ) ; Autorisé ssi le droit possédé par l’appelant sur xest supérieur à celui requis par F (l’appelé)

• Ordre partiel : Shared > tout évidemment Shared_rp_wp > Shared_rp = Shared_r Shared_rp_wp > Shared_wp > Shared_w > Shared_cw[p] Shared_rp_wp > Shared_r_w = Attention : _r_w maj de la valeur possible, mais pas de Fork !

Justification des restrictions• Détection du parallélisme + éviter les copies• Restrictions sémantique naturelle sans perte de

parallélisme• Conséquence : ces 2 programmes Athapascan-1 sont équivalents

struct { void operator () ( <args> ) { stmts_1 ; Fork<F1>()( <args1 > ) ; stmts_2 ; Fork<F2>()( <args2 > ) ; stmts_3 ;}

struct { void operator () ( <args> ) { deque d ; stmts_1 ; push(d, <args1 > ) ; stmts_2 ; push(d, <args2 > ) ; stmts_3 ; Fork<F1>()( pop(d) ); Fork<F2>()( pop(d) );}





– format, initialisation, compilation, exécution

Initialisation/Terminaison

• int main( int argc, char** argv) { …. a1_system::init( argc, argv ) ; …. // tous les processus lourds exécutent a1_system::init_commit() ; // synchronisation …. if (a1_system::self_node() == 0) { …. // le « corps » du main : Fork, … } a1_system::terminate(); // attente fin des tâches return 0 ; }

Compilation / Exécution

• Environnement :source ~maillard/ATHAPASCAN/sparc_DIST_INSTALL/bin/a1_setup.csh ou …/ix86_SMP_INSTALL/… etc

• Makefile : gmake clean; gmake fibo

include $(A1_MAKEFILE) # CXXFLAGS += -DSEQUENTIAL

• Exécution : séquentiel : fibo 12 SMP : fibo 12 -a1_pool 4 distribué : a0run fibo 12 -a0n 3 -a1_pool 4:2:5 -a1_stat -a1_trace_file fich

• Annotation ordonnancement, visualisation : cf doc

http://www-apache.imag.fr/software/ath1

Communications (MPI,…) Threads (POSIX, Marcel, …)

Athapascan-0

Projet APACHE et environnement de programmation parallèle ATHAPASCAN

• Programmation efficace et portable des machines parallèles

• Applications cibles : applications régulières ou irrégulières

• Machines cibles : SMP, architecture distribuée, grape de SMP

Vis

uali

sati

on

Interface applicative

Portabilité matérielle

Athapascan-1

Applications

Thème de la thèse : définition de l’interface applicative validation pour la programmation en calcul scientifique

Download - Athapascan-1 Interface C++ de programmation parallèle Équipe APACHE LMC/ID - IMAG Jean-Guillaume Dumas, Nicolas Maillard Jean-Louis Roch, Thierry Gautier

Top Related