computational scienceofcomputer systems -...

33

Upload: hahanh

Post on 06-Sep-2018

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Computational Science of Computer SystemsMéthodologies d'expérimentation pour l'informatique distribuée à large échelle

Martin QuinsonUniversité de Lorraine, Inria Nancy

Joint work with many colleagues: P. Bédaride, H. Casanova, P.N. Clauss, G. Corona,

A. Degomme, F. Desprez, S. Genaud, A. Giersch, M. Guthmuller, Arnaud Legrand,

Stephan Merz, L. Nussbaum, C. Rosa, M. Stillwell, Frédéric Suter, C. Thiéry, and many interns.

January 21th 2015Rennes

Page 2: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

About Me

Curriculum VitæI 1999: Maîtrise (informatics) at Université de Saint ÉtienneI 2003: PhD (informatics) at ENS-LyonI 2004: Post-Doctoral Researcher at University of California, Santa Barbara.I 2004: Temporary teaching assistant at Université de Grenoble.I Since 2005: Assistant professor at Université de Lorraine / Telecom NancyI 2011 � 2013: On leave at Inria Nancy � Grand EstI March 2013: Habilitation ThesisI 2013 � 2014: Leader of the Algorille joint team (Algorithms for the Grid)I 2015 �: Member of the VeriDis joint team (Veri�cation of Distributed Systems)I Future? Professor in ENS-Rennes (team Myriads) ?

Research Common Theme since 15 years

I Discovery and Modeling of Large-scale HPC Systems (since my M.S work!)I Make them usable by others: e.g., provide performance models to schedulersI One of the main contributors to SimGrid, a scienti�c instrument for such studies

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 2/20

Page 3: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Modern Computers are Large and Complex

Massive Parallelism

1. Tianhe-2 (China) 3,120,000 cores 18MW2. Titan (USA) 560,640 cores 8MW3. Sequoia (USA) 1,572,864 cores 8MW4. K Computer (Japan) 705,024 cores 13MW5. Mira (USA) 786,432 cores 4MW

Computational Science ; ExaScale Systems

I Huge impact in all sciences and techniques and industries and businessesI 1 Exa�op = 1018 operations. One million million million operations. . .

Not only in Computational Science

I Google dissipates 300MW ; Botnets control millions of zombie computersI In addition, these systems are heterogeneous and dynamic

So, how do we study these beasts?

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 3/20

Page 4: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Computational Science of Computer Systems

My Research Field: Methodologies of Experimentation

I Goal: assess the performance and correctness of large-scale computer systemsI Question: Are we really producing scienti�cally sound results?I Main contribution: SimGrid, a simulator of large-scale computer system

My approach: I am a physicist

I Empirically consider large-scale computer systems as natural objectsI Eminently arti�cial artifacts, but complexity reaches �natural� levelsI Other sciences routinely use computers to understand complex systems

[PPL'09], cited 52 times.

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 4/20

Page 5: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Simulating Distributed Systems

Simulation: Fastest Path from Idea to DataI Get preliminary results from partial implementationsI Experimental campaign with thousands of runs within the weekI Test your scienti�c idea, don't �ddle with technical subtleties (yet)

Idea or MPI code

Experimental Setup

+ ⇝Scientific Results

Models

Simulation

Simulation: Easiest Way to Study Distributed ApplicationsI Everything is actually centralized: Partially mock parts of your protocolI No heisenbug: (Simulated) time does not change when you capture more dataI Clairevoyance: Observe every bits of your application and platformI High Reproducibility: No or very few variabilityI Capacity planning: What if network or CPU were fasterI Don't waste resources to debug and test

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 5/20

Page 6: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Simulating Distributed Systems

Simulation: Fastest Path from Idea to DataI Get preliminary results from partial implementationsI Experimental campaign with thousands of runs within the weekI Test your scienti�c idea, don't �ddle with technical subtleties (yet)

Idea or MPI code

Experimental Setup

+ ⇝Scientific Results

Models

Simulation

Simulation: Easiest Way to Study Distributed ApplicationsI Everything is actually centralized: Partially mock parts of your protocolI No heisenbug: (Simulated) time does not change when you capture more dataI Clairevoyance: Observe every bits of your application and platformI High Reproducibility: No or very few variabilityI Capacity planning: What if network or CPU were fasterI Don't waste resources to debug and test

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 5/20

Page 7: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Simulation Challenges

Idea or MPI code

Experimental Setup

+ ⇝Scientific Results

Models

Simulation

Challenges for the Tool Makers

I Validity: Get realistic results (controlled experimental bias)I Scalability: Fast enough and Big enough

I Open Science: Integrated lab notes, runner, post-processing (data provenance)

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 6/20

Page 8: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Simulation of Parallel/Distributed Systems

Network Protocols: Standards emerged: GTNetS, DaSSF, OmNet++, NS3Huge amount of non-standard tools in other domains:

I Grid Computing OptorSim ChicagoSim GridSim JFreeSim . . .

I Peer-to-peer P2Psim SimP2P PeerSim OverSim . . .

I Volunteer Computing SimBA EmBOINC SimBOINC . . .

I HPC/MPI Dimemas PSinS BigSim LogGoPSim XSim SST . . .

I Cloud Computing CloudSim GroudSim iCanCloud GreenCloud . . .

This raises severe methodological/reproducibility issues:I Short-lived, badly supported (software QA), sparse validity assessment

SimGrid: a 15 years old joint projectI Versatile: Grid, P2P, Clouds, HPC, VolunteerI Collaborative: (ANR, CNRS, Univ., Inria) Open Source: active communityI Widely used: 150 publications by 120 individuals, 30 contributors

http://simgrid.org/ [UKSim'08] cited 350, [JPDC'14]

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 7/20

Page 9: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Simulation of Parallel/Distributed Systems

Network Protocols: Standards emerged: GTNetS, DaSSF, OmNet++, NS3Huge amount of non-standard tools in other domains:

I Grid Computing OptorSim ChicagoSim GridSim JFreeSim . . .

I Peer-to-peer P2Psim SimP2P PeerSim OverSim . . .

I Volunteer Computing SimBA EmBOINC SimBOINC . . .

I HPC/MPI Dimemas PSinS BigSim LogGoPSim XSim SST . . .

I Cloud Computing CloudSim GroudSim iCanCloud GreenCloud . . .

This raises severe methodological/reproducibility issues:I Short-lived, badly supported (software QA), sparse validity assessment

SimGrid: a 15 years old joint projectI Versatile: Grid, P2P, Clouds, HPC, VolunteerI Collaborative: (ANR, CNRS, Univ., Inria) Open Source: active communityI Widely used: 150 publications by 120 individuals, 30 contributors

http://simgrid.org/ [UKSim'08] cited 350, [JPDC'14]

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 7/20

Page 10: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

SimGrid Key Features: Fluid Network Model

I Packet level models: Full net stack. Inherently slow, hard to instantiateI Simple models: Delay-based, distribution, coordinates

Very scalable, but no topology, no network congestion

I Fluid models: Share bandwidth between �ows on macroscopic evts

Bandwidth sharing as an optimization problem∑if �ow i uses link j

ρi 6 Cj

I Max-Min objective function: max (min (ρi ))

I Reno fairness: max(∑

arctan (ρi ))

I Vegas fairness: max(∑

log (ρi ))

We implement, (in)validate and optimize these models since 10 years

I The classical "Observe, Analyze, Hypothesis, Test" loop

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 8/20

Page 11: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

SimGrid Key Features: Fluid Network Model

I Packet level models: Full net stack. Inherently slow, hard to instantiateI Simple models: Delay-based, distribution, coordinates

Very scalable, but no topology, no network congestion

I Fluid models: Share bandwidth between �ows on macroscopic evts

Bandwidth sharing as an optimization problem∑if �ow i uses link j

ρi 6 Cj

I Max-Min objective function: max (min (ρi ))

I Reno fairness: max(∑

arctan (ρi ))

I Vegas fairness: max(∑

log (ρi ))

We implement, (in)validate and optimize these models since 10 years

I The classical "Observe, Analyze, Hypothesis, Test" loop

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 8/20

Page 12: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Validity Success Stories

unmodi�ed NAS CG on a TCP/Ethernet cluster (Grid'5000)

B

10

15

20

25

30

CG

6432 128168Number of nodes

Spe

edup

Modelq Real

SimGrid

LogGPS

Key aspects to obtain this result

I Network Topology: Contention (large msg) and Synchronization (small msg)I Applicative (collective) operations (stolen from real implementations)I Instantiate Platform models (matching e�ects, not docs)I All included in SimGrid but the instantiation (remains manual for now)

[IPDPS'11] cited 35, [SC'13]Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 9/20

Page 13: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Validity Success Stories

unmodi�ed NAS CG on a TCP/Ethernet cluster (Grid'5000)

B

qq

qqq

10

15

20

25

30

CG

6432 128168Number of nodes

Spe

edup

Modelq RealNetwork collapse

SimGrid

LogGPS

Discrepency between Simulation and Real Experiment. Why?

I Massive switch packet drops lead to 200ms timeouts in TCP!I Tightly coupled: the whole application hangs until timeoutI Noise easy to model in the simulator, but useless for that very studyI Our prediction performance is more interesting to detect the real issue

[IPDPS'11] cited 35, [SC'13]Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 9/20

Page 14: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Agenda

Introduction

Computational Science of Computer Systems (CS2)

Simulation Models

Parallel Simulation of Discrete Event Systems

Dynamic Veri�cation of Distributed Applications

Conclusion

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 9/20

Page 15: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Parallel Simulation of Discrete Event Systems

Classical Parallel Schema: split the whole applicative modelLP #1 LP #2

LP #3 LP #4

I Leads to good speedups (but still poor performance)dPeerSim: 4h → 1h when 2 LPs → 16 LPs (but 50s in sequential PeerSim)

New approach: Split at Virtualization layer (not in simulation engine)I Virtualization contains threads (user's stack)I Engine & Models remains sequential

Models + EnginesVirtualization + SynchroUser

tnMU1

U2

U3

tn+1 tn+2

I Synchronization costs of paramount importance

Sim

ulation

Workload User Code

Virtualization Layer

Networking Models

Simulation Engine

ExecutionEnvironment

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 10/20

Page 16: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Parallel Simulation of Discrete Event Systems

Classical Parallel Schema: split the whole applicative modelLP #1 LP #2

LP #3 LP #4

I Leads to good speedups (but still poor performance)dPeerSim: 4h → 1h when 2 LPs → 16 LPs (but 50s in sequential PeerSim)

New approach: Split at Virtualization layer (not in simulation engine)I Virtualization contains threads (user's stack)I Engine & Models remains sequential

Models + EnginesVirtualization + SynchroUser

tnMU1

U2

U3

tn+1 tn+2

I Synchronization costs of paramount importanceSim

ulation

Workload User Code

Virtualization Layer

Networking Models

Simulation Engine

ExecutionEnvironment

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 10/20

Page 17: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

E�cient Parallel Fine-Grained SimulationSimGrid is an Operating System

Simcalls separate processes, alleviating locking issues

I Very similar to syscalls in an operating system

Functional View

Process Process Process

SimCall Interface

MaestroSimulation Modelske

rnel

Temporal View

Models+Engines

Virtualization + SynchroUser (isolated)

simcallrequest answer

actual interaction

M

U2

U1

U3

Leveraging Multicores

⇒ More processes than cores ; Worker Threads (that execute co-routines ;)

Worker Worker Worker

MaestroSimulation Modelske

rnel

Processes

Functional View

T1tn

T2

tn+1M

Temporal View

... ...T2

Tn

T1

fetch_add()futex_wait()futex_wake()

Ideal Algorithm

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 11/20

Page 18: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

E�cient Parallel Fine-Grained SimulationSimGrid is an Operating System

Simcalls separate processes, alleviating locking issues

I Very similar to syscalls in an operating system

Functional View

Process Process Process

SimCall Interface

MaestroSimulation Modelske

rnel

Temporal View

Models+Engines

Virtualization + SynchroUser (isolated)

simcallrequest answer

actual interaction

M

U2

U1

U3

Leveraging Multicores

⇒ More processes than cores ; Worker Threads (that execute co-routines ;)

Worker Worker Worker

MaestroSimulation Modelske

rnel

Processes

Functional View

T1tn

T2

tn+1M

Temporal View

... ...T2

Tn

T1

fetch_add()futex_wait()futex_wake()

Ideal Algorithm

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 11/20

Page 19: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Performance Results

I Scenario: Initialize Chord, and simulate 1000 seconds of protocolI Arbitrary Time Limit: 12 hours (kill simulation afterward)

0

10000

20000

30000

40000

0 500000 1e+06 1.5e+06 2e+06

Run

ning

tim

e in

sec

onds

Number of nodes

OverSim (OMNeT++)PeerSim

OverSim (simple underlay)SimGrid (sequential)SimGrid (4 threads)

Largest simulated scenarioSize Time

Omnet++ 10k 1h40PeerSim 100k 4h36OverSim 300k 10h

SimGrid, seq10k 32s300k 32mn2M 6h18

SimGrid//10k 130s300k 40mn2M 5h55

Memory Usage18kiB /process (stack: 12kiB)

First time that PDES is (a little) faster than DES [CCGrid'12], cited 17

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 12/20

Page 20: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Agenda

Introduction

Computational Science of Computer Systems (CS2)

Simulation Models

Parallel Simulation of Discrete Event Systems

Dynamic Veri�cation of Distributed Applications

Conclusion

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 12/20

Page 21: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Assessing the Correctness of HPC codes?

Writing Distributed Apps is notoriously di�cult, but

The Good Old Days

I MPI codes circumvented issues with rigid communication patterns

I Performance First: fast code that rarely fail-stop ≫ correct slow code

These Days are Now Over

I But rigid patterns do not scale! We now have to release the gripI But this is dangerous! We now have to explicitly seek for correctness

Slowly, old ignored problems resurface

I When Tests are not enough anymore, turn to Formal Methods

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 13/20

Page 22: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Model Checking and Dynamic Veri�cation

Automated Formal MethodsI Try to assess the correctness of a system by actively searching for faultsI If no fault found after an exhaustive search, correctness experimentally provedI Dynamic Veri�cation: Model Checking applied to real applications

Exhaustive Exploration1

2

iSend

3

WaitTimeout

1 1

iRecv

4

iRecv

5

Test FALSE

6

MC_RANDOM

7

MC_RANDOM

8

MC_RANDOM

9

MC_RANDOM

Test FALSE

1 2

Wait

1 3

iRecv

2 4

Test TRUE

1 4

WaitTimeout

1 5

Test TRUE

1 6

iSend

1 7

iRecv

1 8

Test FALSE

1 9

MC_RANDOM

2 0

MC_RANDOM

2 1

MC_RANDOM

2 2

MC_RANDOM

Test FALSE

2 5

iRecv

2 6

WaitTimeout

2 9

iSend

2 7

iSend

iRecv

3 0

Wait

3 8

iRecv

3 1

iRecv

3 2

Test FALSE

3 3

MC_RANDOM

3 4

MC_RANDOM

3 5

MC_RANDOM

3 6

MC_RANDOM

Test FALSE

3 9

Wait

4 0

iRecv

4 1

Test FALSE

4 2

MC_RANDOM

4 3

MC_RANDOM

4 4

MC_RANDOM

4 5

MC_RANDOM

Test FALSE

Model Checking: the Big Idea

I My preferred outcome: a counter-exampleI I tend to bug �nding, not certi�cation

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 14/20

Page 23: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Model Checking and Dynamic Veri�cation

Automated Formal MethodsI Try to assess the correctness of a system by actively searching for faultsI If no fault found after an exhaustive search, correctness experimentally provedI Dynamic Veri�cation: Model Checking applied to real applications

Exhaustive Exploration1

2

iSend

3

WaitTimeout

1 1

iRecv

4

iRecv

5

Test FALSE

6

MC_RANDOM

7

MC_RANDOM

8

MC_RANDOM

9

MC_RANDOM

Test FALSE

1 2

Wait

1 3

iRecv

2 4

Test TRUE

1 4

WaitTimeout

1 5

Test TRUE

1 6

iSend

1 7

iRecv

1 8

Test FALSE

1 9

MC_RANDOM

2 0

MC_RANDOM

2 1

MC_RANDOM

2 2

MC_RANDOM

Test FALSE

2 5

iRecv

2 6

WaitTimeout

2 9

iSend

2 7

iSend

iRecv

3 0

Wait

3 8

iRecv

3 1

iRecv

3 2

Test FALSE

3 3

MC_RANDOM

3 4

MC_RANDOM

3 5

MC_RANDOM

3 6

MC_RANDOM

Test FALSE

3 9

Wait

4 0

iRecv

4 1

Test FALSE

4 2

MC_RANDOM

4 3

MC_RANDOM

4 4

MC_RANDOM

4 5

MC_RANDOM

Test FALSE

Model Checking: the Big Idea

I My preferred outcome: a counter-exampleI I tend to bug �nding, not certi�cation

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 14/20

Page 24: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

SimGridMC: Formal Methods in SimGrid

Verify any application that would run in SimGrid

I Reuse the simulator's light virtualization to mediate apps' actionsI Replace the simulation kernel underneath with a model checkerI Tests all causally possible orders of events to dynamically verify the appI System-level checkpoints the app to then rewind and explore another pathI This works with SMPI, and MSG (our simple API to study CSP algorithms)

SIMIX

SURF

MSG SMPI SIMDAG

User Code

Platform

Description372435work

remaining

variable

530530

50664

245245

Concurrentprocesses

Synchro.abstractions

...

...

...

App. spec. as concurrent code

App. spec. astask graph

...

x1x2

x3

x3

+

xn

...

+

+ xn

Variables ResourceConstraints

6 CLm

6 CL2

6 CP

6 CL1x1

... ...

Activities

...

{

CL2

CLm CL1

Cp

1

2

iSend

3

iRecv

4

Wait

5

iRecv

1 5 3

Test TRUE

6

Test TRUE

7

iSend

8

Wait

9

iRecv

1 0

Test FALSE

1 1

MC_RANDOM (0)

1 1 5

MC_RANDOM (1)

1 2 2

iRecv

1 2

MC_RANDOM (0)

1 0 5

MC_RANDOM (1) 1 1 2

iRecv

1 3

MC_RANDOM (0)MC_RANDOM (1)

1 0 7

iRecv

MC_RANDOM (0)

1 5

MC_RANDOM (1)

2 4

iRecv

1 6

iSend

1 7

iRecv

1 8

Wait

1 9

Test TRUE

2 0

iSend

2 1

Wait

2 2

iRecv

Test FALSE

2 5

MC_RANDOM (0)

1 0 3

MC_RANDOM (1)

2 6

Test FALSE

2 7

MC_RANDOM (0)

6 6

MC_RANDOM (1)

2 8

MC_RANDOM (0)

6 2

MC_RANDOM (1)

2 9

MC_RANDOM (0)MC_RANDOM (1)

MC_RANDOM (0)

3 1

MC_RANDOM (1)

3 2

iSend

3 5

Test FALSE

3 3

Wait

Test TRUE

3 6

iSend

5 8

MC_RANDOM (0)

6 0

MC_RANDOM (1)

3 7

Wait

5 4

MC_RANDOM (0)

5 6

MC_RANDOM (1)

3 8

MC_RANDOM (0)

5 2

MC_RANDOM (1)

3 9

MC_RANDOM (0)

4 4

MC_RANDOM (1)

4 0

MC_RANDOM (0)

MC_RANDOM (1)

4 1

MC_RANDOM (0) MC_RANDOM (1)

4 2

Test TRUE

iSend

4 5

Test TRUE

4 6

iSend

4 7

Wait

4 8

iRecv

Test FALSE

Test TRUE

Wait Wait

iSend iSend

6 3

Test FALSE

Test FALSE

6 7

iSend

7 5

Test FALSE

6 8

Wait

6 9

Test TRUE

7 0

iSend

7 1

Wait

7 2

iSend

7 3

iRecv

Test FALSE

7 6

iSend

9 9

MC_RANDOM (0)

1 0 1

MC_RANDOM (1)

7 7

Wait

9 5

MC_RANDOM (0)

9 7

MC_RANDOM (1)

7 8

MC_RANDOM (0)

9 3

MC_RANDOM (1)

7 9

MC_RANDOM (0)

8 4

MC_RANDOM (1)

8 0

MC_RANDOM (0)

MC_RANDOM (1)

8 1

MC_RANDOM (0) MC_RANDOM (1)

8 2

Test TRUE

iSend

8 5

Test TRUE

8 6

iSend

8 7

Wait

8 8

iSend

8 9

iRecv

Test FALSE

Test TRUE

Wait Wait

iSend iSend

iSend

Test FALSE

MC_RANDOM (0)

1 0 9

MC_RANDOM (1)

Test FALSE

MC_RANDOM (0)

MC_RANDOM (1)

1 1 6

iSend

1 1 7

iRecv

1 1 8

Wait

1 1 9

Test TRUE

1 2 0

iSend

Wait

MC_RANDOM (0)

1 2 4

MC_RANDOM (1)

1 2 5

iSend

1 2 8

Test FALSE

1 2 6

Wait

Test TRUE

1 2 9

iSend

1 4 9

MC_RANDOM (0)

1 5 1

MC_RANDOM (1)

1 3 0

Wait

1 4 5

MC_RANDOM (0)

1 4 7

MC_RANDOM (1)

1 3 1

MC_RANDOM (0)

1 4 3

MC_RANDOM (1)

1 3 2

MC_RANDOM (0)

1 3 7

MC_RANDOM (1)

1 3 3

MC_RANDOM (0)

MC_RANDOM (1)

1 3 4

MC_RANDOM (0) MC_RANDOM (1)

1 3 5

Test TRUE

iSend

1 3 8

Test TRUE

1 3 9

iSend

Wait

Test TRUE

Wait Wait

iSend iSend

iRecv

1

2

iSend

3

WaitTimeout

1 1

iRecv

4

iRecv

5

Test FALSE

6

MC_RANDOM

7

MC_RANDOM

8

MC_RANDOM

9

MC_RANDOM

Test FALSE

1 2

Wait

1 3

iRecv

2 4

Test TRUE

1 4

WaitTimeout

1 5

Test TRUE

1 6

iSend

1 7

iRecv

1 8

Test FALSE

1 9

MC_RANDOM

2 0

MC_RANDOM

2 1

MC_RANDOM

2 2

MC_RANDOM

Test FALSE

2 5

iRecv

2 6

WaitTimeout

2 9

iSend

2 7

iSend

iRecv

3 0

Wait

3 8

iRecv

3 1

iRecv

3 2

Test FALSE

3 3

MC_RANDOM

3 4

MC_RANDOM

3 5

MC_RANDOM

3 6

MC_RANDOM

Test FALSE

3 9

Wait

4 0

iRecv

4 1

Test FALSE

4 2

MC_RANDOM

4 3

MC_RANDOM

4 4

MC_RANDOM

4 5

MC_RANDOM

Test FALSE

1

2

[(1)c-1.me] iRecv

3

[(2)c-2.me] iSend

4

[(1)c-1.me] Wait [(2)->(1)]

5

[(1)c-1.me] iSend

6

[(2)c-2.me] Wait [(2)->(1)]

7

[(2)c-2.me] iRecv

8

[(1)c-1.me] Wait [(1)->(2)]

9

[(1)c-1.me] iRecv

10

[(2)c-2.me] Wait [(1)->(2)]

11

[(2)c-2.me] iSend

12

[(1)c-1.me] Wait [(2)->(1)]

13

[(1)c-1.me] iRecv

14

[(2)c-2.me] Wait [(2)->(1)]

15

[(2)c-2.me] iSend

[(1)c-1.me] Wait [(2)->(1)]

17

[(2)c-2.me] Wait [(2)->(1)]

18

[(1)c-1.me] Wait [(2)->(1)]

[(1)c-1.me] iSend

20

[(2)c-2.me] iRecv

21

[(1)c-1.me] iSend

22

[(1)c-1.me] Wait [(1)->(2)]

23

[(1)c-1.me] iRecv

[(2)c-2.me] Wait [(1)->(2)]

25

[(3)c-3.me] iSend

26

[(1)c-1.me] Wait [(3)->(1)]

27

[(1)c-1.me] iRecv

28

[(2)c-2.me] Wait [(1)->(2)]

29

[(2)c-2.me] iSend

30

[(1)c-1.me] Wait [(2)->(1)]

31

[(1)c-1.me] iRecv

32

[(2)c-2.me] Wait [(2)->(1)]

33

[(2)c-2.me] iSend

34

[(1)c-1.me] Wait [(2)->(1)]

35

[(1)c-1.me] iSend

36

[(2)c-2.me] Wait [(2)->(1)]

37

[(2)c-2.me] iRecv

38

[(1)c-1.me] Wait [(1)->(2)]

39

[(1)c-1.me] iRecv

40

[(2)c-2.me] Wait [(1)->(2)]

41

[(2)c-2.me] iSend

[(1)c-1.me] Wait [(2)->(1)]

43

[(2)c-2.me] Wait [(2)->(1)]

44

[(1)c-1.me] Wait [(2)->(1)]

[(1)c-1.me] iRecv

46

[(2)c-2.me] iSend

47

[(1)c-1.me] iRecv

48

[(1)c-1.me] Wait [(2)->(1)]

49

[(1)c-1.me] iSend

[(2)c-2.me] Wait [(2)->(1)]

51

[(3)c-3.me] Wait [(3)->(1)]

52

[(2)c-2.me] Wait [(2)->(1)]

53

[(2)c-2.me] iRecv

54

[(1)c-1.me] Wait [(1)->(2)]

55

[(1)c-1.me] iRecv

56

[(2)c-2.me] Wait [(1)->(2)]

57

[(2)c-2.me] iSend

58

[(1)c-1.me] Wait [(2)->(1)]

59

[(1)c-1.me] iRecv

60

[(2)c-2.me] Wait [(2)->(1)]

61

[(2)c-2.me] iSend

62

[(1)c-1.me] Wait [(2)->(1)]

63

[(1)c-1.me] iSend

[(2)c-2.me] Wait [(2)->(1)]

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 15/20

Page 25: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Example: Out of order receive

I Two processes send a message to a third oneI The receiver expects the message to be in orderI This may happen. . . or not

rank1

rank2

rank0

send(1)

send(2)

x ← 1 y ← 2

x < y

if (MPI_rank() == 0) {

MPI_Recv(&x , MPI_ANY_SOURCE);

MPI_Recv(&y , MPI_ANY_SOURCE);

MC_assert(x < y);

} else {

MPI_Send (&rank , 0);

}

rank1

rank2

rank0

send(1)

send(2)

x ← 2 y ← 1

x 6< y

**************************

*** PROPERTY NOT VALID ***

**************************

Counter-example execution trace:

[(1)recver] iRecv (dst=recver, buff=(verbose only), size=(verbose only))

[(3)sender] iSend (src=sender, buff=(verbose only), size=(verbose only))

[(1)recver] Wait (comm=(verbose only) [(3)sender -> (1)recver])

[(1)recver] iRecv (dst=recver, buff=(verbose only), size=(verbose only))

[(2)sender] iSend (src=sender, buff=(verbose only), size=(verbose only))

[(1)recver] Wait (comm=(verbose only) [(2)sender -> (1)recver])

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 16/20

Page 26: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Mitigating the State Space ExplosionMany execution paths are redundant ; cut exploration when possible

Dynamic Partial Ordering Reduction (DPOR)

I Works on histories: test only one transitions' interleaving if independentI Independence theorems: Local events are independent; iSend+iRecv also; . . .I Must be conservative (exploration soundness at risk!)I It works well (for safety properties)

System-Level State Equality

I Works on states: detect when a given space was previously exploredI Complementary to DPOR (but not compatible yet)I Introspect the C/C++/Fortran app just like gdb (+some black magic)

[AVOCS'10] [PDP'15]

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 17/20

Page 27: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Some Results

Wild safety bug in our Chord implementation (≈ 500 lines of C)

I Simulation: bug on large instances only; MC �nds small trace (1s with DPOR)

Mocked liveness bug

I Buggy centralized mutual exclusion: last client never obtains the CSI About 100 lines � state snapshot size: 5Mib; Veri�ed with up to 7 processes

Verifying MPICH3 complience tests

I Looking for assertion failures, deadlocks and non-progressive cyclesI 6 tests; ≈ 1300 LOCs (per test) � State snapshot size: ≈ 4MB

I We veri�ed several MPI2 collectives too © (but all good so far /)

Protocol-wide Properties

I e.g, Send-deteministic: On each node, send and recv evts always in same order(allows more e�cient application checkpointing)

I Even harder than liveness properties (not LTL), but doable in SimGridComputational Science of Computer Systems Introduction CS2 Models PDES Formal CC 18/20

Page 28: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Much more to say about SimGrid (too little time)

Hybrid Network Models

I Fluid model: model contention in steady state for large messagesI LogOP model: model intra-node delays and synchronizationI Also: MPI collectives, TCP (slow-start, cross-tra�c), soon IB

Realistic EmulationI SMPI: Study real MPI applications within SimGridI Simterpose: Study real arbitrary applications (ongoing)

High Performance Simulation

I Fast Enough: Innovative PDES; E�cient algorithms and implementationsI Big Enough: Scalable and versatile platform representation

Formal Veri�cation of Distributed Apps

I Safety, Liveness or CTL properties, with DPOR or state equality

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 19/20

Page 29: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

Research Project for the Next 10 Years

Make Large-Scale Distributed Systems Easier to UseI They are pervasive in our connected societies, yet almost uncontrolledI Research Plan: Computational Science of Computer Systems

I Leveraging computers to understand computers

I Expected Visible Outcome: Propose a valgrind-like for Distributed SystemsI This is perfectly in line with what I'm doing since 15 years

Why in Myriads?

I Distributed Systems, with focus on experimentation (Grid'5000, etc)I Many works that solve hard OS-level issues to help distributed systems

Why at ENS Rennes?

I We need more teachers that are pro�cient with Systems Internals

I Teaching of paramount importance to me. Lots of activities on teaching CS:I Unplugged activities; Programming exercisers; Research groups; National Days

I More on this on Friday 12:30 :)

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 20/20

Page 30: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

What is SMPI?I Reimplementation of MPI on top of SimGridI Imagine a VM running real MPI applications on

platforms that does not existI Horrible over-simpli�cation, but you get the idea

I Computations run for real on your laptop,Communications are faked

What is it good for?I Performance Prediction (�what-if?� scenarios)

I Platform dimensioning; Apps' parameter tuning

I Teaching parallel programming and HPCI Reduced technical burdenI No need for real hardware, or hack your hardware

Studies that you should NOT attempt with SMPI

I Predict the impact of L2 caches' size on your codeI Interactions of TCP Reno vs. TCP Vegas vs. UDPI Claiming a simulation of 1000 billions nodes

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 20/20

Page 31: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

SimGrid Network ModelMeasurements

Small

Medium1Medium2

Detached

Small

Medium1Medium2

Detached

MPI_Send MPI_Recv

1e−04

1e−02

1e+01 1e+03 1e+05 1e+01 1e+03 1e+05Message size (bytes)

Dur

atio

n (s

econ

ds) group

Small

Medium1

Medium2

Detached

Large

Hybrid Model

Asynchronous (k 6 Sa)

T3

Pr

Ps

T1

T2

Detached (Sa < k 6 Sd)

Ps

Pr

T2T4

T1

Synchronous (k > Sd)

Ps

Pr

T4 T2

Fluid model: account for contention and network topology

1-39 40-74 105-14475-104

1G10G

;

DownUp DownUp DownUp DownUp

10G1G

1−39 40−74 105−14475−104

13G

10G

Limiter

... ...... ...1.5G

1G

Limiter

DownUp

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 20/20

Page 32: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

SimGrid Modeling of MPIMPI Collectives

I SimGrid implements more than 120 algorithms for the 10 main MPI collectivesI Selection logic from OpenMPI, MPICH can be reproduced

HPC TopologiesEmpty+coords

Full

Full

Dijkstra

Floyd

Rule−based

Rule−based

Rule−based

basedRule−

AS1

AS2

AS4

AS5

AS7

AS6

AS5−3

AS5−1 AS5−2

AS5−4

Torus Fat-trees Hierarchies of ASesBut also

I External load (availability changes), Host and link failures, Energy (DVFS)I Virtual Machines, that can be migrated; Random platform generators

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 20/20

Page 33: Computational ScienceofComputer Systems - IRISApeople.irisa.fr/Martin.Quinson/blog/2015/0121/CS2.pdf · Computational ScienceofComputer Systems Méthodologies d'expérimentation pour

SimTerpose Project

Dream: Simulate any applications on top of SimGrid

Simulated Setup

P1

Port

80

Host A

P2

Host B

Network

Port ??

Take 1: ptrace plumbering

P1 P2SimTerpose

10042 Real Port

Hosting Computer

1

2

3

4

ptrace interceptor

10042

1

2

3

connect(host,10042)

connect(A,80)accept(80)

accept(10042)

Take 2: Full Emulation

P1 P2 P3

ptrace interceptor

Threads in simulator

Direct memory access

Current StateI Functional POC: send/recv exchangeI Need to handle the other 200 syscalls

I Intercept, store metadataI Inform simulator, report e�ect on procs

I Time and DNS need love at link timeI We are redeveloping a libC! (in strange way ;)

Computational Science of Computer Systems Introduction CS2 Models PDES Formal CC 20/20