nvidia / pscds / upspr / ensae! laboratoire de statistique arnak dalalyan mdc / telecom paristech!...
Post on 11-Jul-2020
3 Views
Preview:
TRANSCRIPT
Pr / UPSud
LRI
CÉCILE GERMAIN
Pr / ENSAE
Laboratoire de Statistique
ARNAK DALALYAN
MdC / Telecom ParisTech
LTCI
ALEXANDRE GRAMFORT
1
NVIDIA / PSCDS / UPSACLAY MEETING
Center for Data ScienceParis-Saclay
March 30, 2015, LAL
MdC / Mines ParisTech
CGS
AKIN KAZAKÇI
DR / CNRS LAL & LRI
CNRS & University Paris-Sud
BALÁZS KÉGL
Center for Data ScienceParis-Saclay
• Why we launched this meeting?
• The Paris-Saclay CDS
• challenges, courses, hackatons
• Where are we going?
• the subject of the discussion after the talks
2
OUTLINE
Center for Data ScienceParis-Saclay
• We do research (both applied and basic) on deep learning
• We are running server-side-execution hackatons which need significant resources for very short periods
• typically 1 day every 6 weeks (not all of them are GPU-intensive)
• mutualization
• Discuss a Saclay-wise initiative on research on GPUs, find the actors, discuss collaboration with NVIDIA
• we, at the CDS, do not do research on GPUs, usually use them through standard ML libraries
• but mutualization can of course go beyond data science
3
WHY WE LAUNCHED THIS MEETING?
Center for Data ScienceParis-Saclay
VIRTUALDATA@P2IO
4
Center for Data ScienceParis-Saclay
DATA SCIENCE
5
Design of automated methods
to analyze massive and complex data
to extract useful information
Center for Data ScienceParis-Saclay6
DATA SCIENCE=
BIG DATA
We are focusing on inference:
data knowledge
Interfacing with infrastructure, security, production
7
UNIVERSITÉ PARIS-SACLAY
19 founding partners
Center for Data ScienceParis-Saclay
UNIVERSITÉ PARIS-SACLAY
8
+ horizontal multi-disciplinary and multi-partner initiatives (“lidex”) to create cohesion
Center for Data ScienceParis-Saclay9
Center for Data ScienceParis-Saclay
A multi-disciplinary initiative to define, structure, and manage the data science ecosystem at the Université Paris-Saclay
http://www.datascience-paris-saclay.fr/
Biology & bioinformaticsIBISC/UEvry LRI/UPSudHepatinovCESP/UPSud-UVSQ-Inserm IGM-I2BC/UPSud MIA/AgroMIAj-MIG/INRALMAS/Centrale
ChemistryEA4041/UPSud
Earth sciencesLATMOS/UVSQ GEOPS/UPSudIPSL/UVSQLSCE/UVSQLMD/Polytechnique
EconomyLM/ENSAE RITM/UPSudLFA/ENSAE
NeuroscienceUNICOG/InsermU1000/InsermNeuroSpin/CEA
Particle physics astrophysics & cosmologyLPP/Polytechnique DMPH/ONERACosmoStat/CEAIAS/UPSudAIM/CEALAL/UPSud
The Paris-Saclay Center for Data ScienceData Science for scientific Data
250 researchers in 35 laboratories
Machine learningLRI/UPSud LTCI/TelecomCMLA/Cachan LS/ENSAELIX/PolytechniqueMIA/AgroCMA/PolytechniqueLSS/SupélecCVN/Centrale LMAS/CentraleDTIM/ONERAIBISC/UEvry
VisualizationINRIALIMSI
Signal processingLTCI/TelecomCMA/PolytechniqueCVN/CentraleLSS/SupélecCMLA/CachanLIMSIDTIM/ONERA
StatisticsLMO/UPSud LS/ENSAELSS/SupélecCMA/PolytechniqueLMAS/CentraleMIA/AgroParisTech
Data sciencestatistics
machine learninginformation retrieval
signal processingdata visualization
databases
Domain sciencehuman society
life brain earth
universe
Tool buildingsoftware engineering
clouds/gridshigh-performance
computingoptimization
Data scientist
Applied scientist
Domain scientist
Data engineer
Software engineer
Center for Data ScienceParis-Saclay
datascience-paris-saclay.fr
@SaclayCDS
LIST/CEA
Center for Data ScienceParis-Saclay10
THE DATA SCIENCE ECOSYSTEM
Data domainsenergy and physical sciences
health and life sciences Earth and environment
economy and society brain
Data scientist
Data trainer
Applied scientist
Domain scientistSoftware engineer
Data engineer
Data sciencestatistics
machine learning information retrieval
signal processing data visualization
databases
Tool building software engineering
clouds/grids high-performance
computing optimization
Center for Data ScienceParis-Saclay
TOOLS
11
We are designing and learning to manage
tools
to accompany data science projects
with different needs
Center for Data ScienceParis-Saclay
TOOLS: LANDSCAPE TO ECOSYSTEM
12
Data scientist
Data trainer
Applied scientist
Domain expertSoftware engineer
Data engineer
Tool building Data domains
Data sciencestatistics
machine learning information retrieval
signal processing data visualization
databases
software engineeringclouds/grids
high-performancecomputing
optimization
energy and physical sciences health and life sciences Earth and environment
economy and society brain
• interdisciplinary projects • matchmaking tool • design and innovation strategy workshops • data challenges
• coding sprints • Open Software Initiative • code consolidator and engineering projects
• bootcamps / hackathons • IT platform for linked data • annotation tool • SaaS data science platform
Center for Data ScienceParis-Saclay
TWO ANALYTICS TOOLS
13
RAPID ANALYTICS AND MODEL PROTOTYPING
DATA CHALLENGES
Center for Data ScienceParis-Saclay
• A data challenge is a recently developed unconventional dissemination and communication tool
• a scientific or industrial data producer arrives with a well-defined problem and a corresponding annotated data set
• defines a quantitative goal
• makes the problem and part of the data set (the training set) public on a dedicated site
• data science experts then take the public training data and submit solutions (predictions) for a test set with hidden annotations
• submissions are evaluated numerically using the quantitative measure
• contestants are listed on a leaderboard
• after a predefined time, typically a couple of months, the final results are revealed and the winners are awarded
14
DATA CHALLENGES
Center for Data ScienceParis-Saclay
• Challenges are useful for
• generating visibility in the data science community about novel application domains
• benchmarking in a fair way state-of-the-art techniques on well-defined problems
• finding talented data scientists
• Limitations
• not necessary adapted to solving complex and open-ended data science problems in realistic environments
• no direct access to solutions and data scientist
• emphasizes competition
15
DATA CHALLENGES
Center for Data ScienceParis-Saclay16
• Single-day coding sessions
• 20-30 participants
• preparation is similar to challenges
• Goals
• focusing and motivating top talents
• promoting collaboration, speed, and efficiency
• solving (prototyping) real problems
RAPID ANALYTICS AND MODEL PROTOTYPING
17
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data ScienceParis-Saclay18
RAPID ANALYTICS AND MODEL PROTOTYPING
19
RAPID ANALYTICS AND MODEL PROTOTYPING
20
RAPID ANALYTICS AND MODEL PROTOTYPING
Center for Data ScienceParis-Saclay21
ANALYTICS TOOLS TO PROMOTE COLLABORATION AND CODE REUSE
Center for Data ScienceParis-Saclay
RAPID ANALYTICS AND MODEL PROTOTYPING
22
2015 Jan 15 replaying the
HiggsML challenge
Center for Data ScienceParis-Saclay
2015 Feb 9 Mortality prediction in septic patients
RAPID ANALYTICS AND MODEL PROTOTYPING
23
Center for Data ScienceParis-Saclay
RAPID ANALYTICS AND MODEL PROTOTYPING
24
2015 Apr 10 Classifying variable stars
Center for Data ScienceParis-Saclay
RAPID ANALYTICS AND MODEL PROTOTYPING
25
2015 May Drug identification from spectra
Center for Data ScienceParis-Saclay
RAPID ANALYTICS AND MODEL PROTOTYPING
26
2015 June Insect classification
Center for Data ScienceParis-Saclay27
THANK YOU!
top related