nvidia / pscds / upspr / ensae! laboratoire de statistique arnak dalalyan mdc / telecom paristech!...

27
Pr / UPSud LRI CÉCILE GERMAIN Pr / ENSAE Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING Center for Data Science Paris-Saclay March 30, 2015, LAL MdC / Mines ParisTech CGS AKIN KAZAKÇI DR / CNRS LAL & LRI CNRS & University Paris-Sud BALÁZS KÉGL

Upload: others

Post on 11-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Pr / UPSud

LRI

CÉCILE GERMAIN

Pr / ENSAE

Laboratoire de Statistique

ARNAK DALALYAN

MdC / Telecom ParisTech

LTCI

ALEXANDRE GRAMFORT

1

NVIDIA / PSCDS / UPSACLAY MEETING

Center for Data ScienceParis-Saclay

March 30, 2015, LAL

MdC / Mines ParisTech

CGS

AKIN KAZAKÇI

DR / CNRS LAL & LRI

CNRS & University Paris-Sud

BALÁZS KÉGL

Page 2: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

• Why we launched this meeting?

• The Paris-Saclay CDS

• challenges, courses, hackatons

• Where are we going?

• the subject of the discussion after the talks

2

OUTLINE

Page 3: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

• We do research (both applied and basic) on deep learning

• We are running server-side-execution hackatons which need significant resources for very short periods

• typically 1 day every 6 weeks (not all of them are GPU-intensive)

• mutualization

• Discuss a Saclay-wise initiative on research on GPUs, find the actors, discuss collaboration with NVIDIA

• we, at the CDS, do not do research on GPUs, usually use them through standard ML libraries

• but mutualization can of course go beyond data science

3

WHY WE LAUNCHED THIS MEETING?

Page 4: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

VIRTUALDATA@P2IO

4

Page 5: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

DATA SCIENCE

5

Design of automated methods

to analyze massive and complex data

to extract useful information

Page 6: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay6

DATA SCIENCE=

BIG DATA

We are focusing on inference:

data knowledge

Interfacing with infrastructure, security, production

Page 7: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

7

UNIVERSITÉ PARIS-SACLAY

19 founding partners

Page 8: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

UNIVERSITÉ PARIS-SACLAY

8

+ horizontal multi-disciplinary and multi-partner initiatives (“lidex”) to create cohesion

Page 9: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay9

Center for Data ScienceParis-Saclay

A multi-disciplinary initiative to define, structure, and manage the data science ecosystem at the Université Paris-Saclay

http://www.datascience-paris-saclay.fr/

Biology & bioinformaticsIBISC/UEvry LRI/UPSudHepatinovCESP/UPSud-UVSQ-Inserm IGM-I2BC/UPSud MIA/AgroMIAj-MIG/INRALMAS/Centrale

ChemistryEA4041/UPSud

Earth sciencesLATMOS/UVSQ GEOPS/UPSudIPSL/UVSQLSCE/UVSQLMD/Polytechnique

EconomyLM/ENSAE RITM/UPSudLFA/ENSAE

NeuroscienceUNICOG/InsermU1000/InsermNeuroSpin/CEA

Particle physics astrophysics & cosmologyLPP/Polytechnique DMPH/ONERACosmoStat/CEAIAS/UPSudAIM/CEALAL/UPSud

The Paris-Saclay Center for Data ScienceData Science for scientific Data

250 researchers in 35 laboratories

Machine learningLRI/UPSud LTCI/TelecomCMLA/Cachan LS/ENSAELIX/PolytechniqueMIA/AgroCMA/PolytechniqueLSS/SupélecCVN/Centrale LMAS/CentraleDTIM/ONERAIBISC/UEvry

VisualizationINRIALIMSI

Signal processingLTCI/TelecomCMA/PolytechniqueCVN/CentraleLSS/SupélecCMLA/CachanLIMSIDTIM/ONERA

StatisticsLMO/UPSud LS/ENSAELSS/SupélecCMA/PolytechniqueLMAS/CentraleMIA/AgroParisTech

Data sciencestatistics

machine learninginformation retrieval

signal processingdata visualization

databases

Domain sciencehuman society

life brain earth

universe

Tool buildingsoftware engineering

clouds/gridshigh-performance

computingoptimization

Data scientist

Applied scientist

Domain scientist

Data engineer

Software engineer

Center for Data ScienceParis-Saclay

datascience-paris-saclay.fr

@SaclayCDS

LIST/CEA

Page 10: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay10

THE DATA SCIENCE ECOSYSTEM

Data domainsenergy and physical sciences

health and life sciences Earth and environment

economy and society brain

Data scientist

Data trainer

Applied scientist

Domain scientistSoftware engineer

Data engineer

Data sciencestatistics

machine learning information retrieval

signal processing data visualization

databases

Tool building software engineering

clouds/grids high-performance

computing optimization

Page 11: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

TOOLS

11

We are designing and learning to manage

tools

to accompany data science projects

with different needs

Page 12: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

TOOLS: LANDSCAPE TO ECOSYSTEM

12

Data scientist

Data trainer

Applied scientist

Domain expertSoftware engineer

Data engineer

Tool building Data domains

Data sciencestatistics

machine learning information retrieval

signal processing data visualization

databases

software engineeringclouds/grids

high-performancecomputing

optimization

energy and physical sciences health and life sciences Earth and environment

economy and society brain

• interdisciplinary projects • matchmaking tool • design and innovation strategy workshops • data challenges

• coding sprints • Open Software Initiative • code consolidator and engineering projects

• bootcamps / hackathons • IT platform for linked data • annotation tool • SaaS data science platform

Page 13: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

TWO ANALYTICS TOOLS

13

RAPID ANALYTICS AND MODEL PROTOTYPING

DATA CHALLENGES

Page 14: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

• A data challenge is a recently developed unconventional dissemination and communication tool

• a scientific or industrial data producer arrives with a well-defined problem and a corresponding annotated data set

• defines a quantitative goal

• makes the problem and part of the data set (the training set) public on a dedicated site

• data science experts then take the public training data and submit solutions (predictions) for a test set with hidden annotations

• submissions are evaluated numerically using the quantitative measure

• contestants are listed on a leaderboard

• after a predefined time, typically a couple of months, the final results are revealed and the winners are awarded

14

DATA CHALLENGES

Page 15: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

• Challenges are useful for

• generating visibility in the data science community about novel application domains

• benchmarking in a fair way state-of-the-art techniques on well-defined problems

• finding talented data scientists

• Limitations

• not necessary adapted to solving complex and open-ended data science problems in realistic environments

• no direct access to solutions and data scientist

• emphasizes competition

15

DATA CHALLENGES

Page 16: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay16

• Single-day coding sessions

• 20-30 participants

• preparation is similar to challenges

• Goals

• focusing and motivating top talents

• promoting collaboration, speed, and efficiency

• solving (prototyping) real problems

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 17: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

17

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 18: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay18

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 19: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

19

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 20: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

20

RAPID ANALYTICS AND MODEL PROTOTYPING

Page 21: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay21

ANALYTICS TOOLS TO PROMOTE COLLABORATION AND CODE REUSE

Page 22: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

RAPID ANALYTICS AND MODEL PROTOTYPING

22

2015 Jan 15 replaying the

HiggsML challenge

Page 23: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

2015 Feb 9 Mortality prediction in septic patients

RAPID ANALYTICS AND MODEL PROTOTYPING

23

Page 24: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

RAPID ANALYTICS AND MODEL PROTOTYPING

24

2015 Apr 10 Classifying variable stars

Page 25: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

RAPID ANALYTICS AND MODEL PROTOTYPING

25

2015 May Drug identification from spectra

Page 26: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay

RAPID ANALYTICS AND MODEL PROTOTYPING

26

2015 June Insect classification

Page 27: NVIDIA / PSCDS / UPSPr / ENSAE! Laboratoire de Statistique ARNAK DALALYAN MdC / Telecom ParisTech! LTCI ALEXANDRE GRAMFORT 1 NVIDIA / PSCDS / UPSACLAY MEETING e y March 30, 2015, LAL

Center for Data ScienceParis-Saclay27

THANK YOU!