mesure hd rolando corrigé - bienvenue · co-organizer: pierre colletp. creation of a start-up...

38
MesureHD MesureHD Développement de nouveaux protocoles de mesure, d'analyse et de traitement des données adaptés aux mesures à hautes données adaptés aux mesures à hautes résolutions et à haut débit par des méthodes biophysiques Marc - André DELSUC méthodes biophysiques. Marc - André DELSUC , Christian ROLANDO Colloque Mastodons - Masse de Données Scientifiques CNRS, 22-23 janvier 2015

Upload: trantu

Post on 12-Sep-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

MesureHDMesureHDDéveloppement de nouveaux protocoles demesure, d'analyse et de traitement desdonnées adaptés aux mesures à hautesdonnées adaptés aux mesures à hautesrésolutions et à haut débit par desméthodes biophysiques

Marc-André DELSUC

méthodes biophysiques.

Marc-André DELSUC, Christian ROLANDO

Colloque Mastodons - Masse de Données Scientifiques

CNRS, 22-23 janvier 2015

The MesureHD consortium

Patrick Combette Laboratoire Jacques Louis Lions ParisINSMI Patrick Combette, Laboratoire Jacques-Louis Lions, Paris INSMI

Emilie Chouzenoux Jean-Christophe Pesquet LaboratoireINS2I

Emilie Chouzenoux , Jean-Christophe Pesquet, Laboratoire d'Informatique Gaspard Monge, Marne-la-Vallée

Pierre Collet, I-Cube, Strasbourg

Marc-André Delsuc, Bruno Kieffer, IGBMC, StrasbourgINSB

, , g

Julia Chamot-Rooke, Institut Pasteur, ParisINSB

Pascale Roy, Synchrotron Soleil,

Christian Rolando, MSAP, LilleINC

Page 2

MS/MS Sequencingte

nsity

As many MS/MS spectra as ions present in the

/Rel

ativ

e in

t as ions present in the MS spectrum

1: MS Spectrumm/zR

nsity

Need to select precursors: data dependant analysis

ativ

e in

ten

m/z2: precursor

selection

Rel

sity

ativ

e in

ten

m/z

3: fragmentation/sequencingR

ela

Exacte mass and number of unique peptide in a genome

Theoretical predicition of identifiedpeptide percentage in function ofpeptide percentage in function ofmass (m/z) at different accuracy.

YeastYeast

C. Elegans

Liu, T., Belov, M. E., Jaitly, N., Qian, W. J., &

Page 4

Smith, R. D. (2007). Accurate mass measurements in proteomics. Chemical reviews, 107(8), 3621-3653.

Why 2D FT-ICR mass spectrometry

MS information on all the compounds at the same time, pparallel acquisition.

• MS/MS: 1 compound at a time serial acquisition One• MS/MS: 1 compound at a time, serial acquisition. One fragmentation mode at a time, either fragments or precursors.

Two-dimensional FT-ICR MS (2D FT-ICR): all compounds at once with both correlations (fragments and precursors), i d d tl f th l it b t i t d f ll tindependently of the complexity, but requires to record a full set of data.

A truly data independent acquisition.

van Agthoven, M. A., Delsuc, M. A., Bodenhausen, G., & Rolando, C. (2013). Towards analytically usefultwo-dimensional Fourier transform ion cyclotron resonance mass spectrometry. Analytical and bioanalyticalchemistry, 405(1), 51-61

Principle of 2D RMN and 2D FT-ICR/MS

2D NMR NOESY

Müller, L., Kumar, A., & Ernst, R. R. (1975). Two‐dimensional carbon‐13 NMR spectroscopy. The Journal of Chemical Physics, 63(12), 5490-491.

P. Pfändler, G. Bodenhausen, J. Rapin, M.-E. Walser, T. Gäumann, J. Am. Chem. Soc. 110 (1988) 5625-5628.

2D NMR versus 2D FT-ICR MS

2D NMR 2D FT-ICR MS

Page 7

Number of papers per year 2D NMR and 2D FT-ICR MS

Problems to be solved for analytical useful 2D FT-ICR

• Preserve FT-ICR resolution during in-cell FT-MS: earlyexperiment was performed using Collision Induced Fragmentationp p g g(CID) with a gas which induces resolution loss as resolution isinversely proportional to pressure.

•Data handling: the size of a 1D FT-ICR spectrum at fullresolution is typically 1 mega points (4 mega bytes). Thetheoretical size of a 2D FT-ICR is 16 peta bytes… In comparison2D NMR is performed with 2048 × 2048 points (32 mega bytes)2D NMR is performed with 2048 × 2048 points (32 mega bytes).

• Scintillation noise removal: for each t (delay) a new bunch of• Scintillation noise removal: for each t1 (delay) a new bunch ofions must be introduced as MS/MS is a destructive processcontrary to NMR which use the same sample after spin relaxation.

Page 8

contrary to NMR which use the same sample after spin relaxation.van Agthoven, M. A., Delsuc, M. A., Bodenhausen, G., & Rolando, C. (2013). Towards analytically useful two-dimensional Fourier transform ion cyclotron resonance mass spectrometry. Analytical and bioanalyticalchemistry, 405(1), 51-61

2D FT-ICR of Insulin (5.7 kDa), ECD dissociation

Page 9

Creating focuss from noise

Page 10Van Putten, E. G., Akbulut, D., Bertolotti, J., Vos, W. L., Lagendijk, A., & Mosk, A. P. (2011). Scattering lensresolves sub-100 nm structures with visible light. Physical review letters, 106(19), 193905.

Basis of signal treatment

Signal time-series : P frequencies

Uniform sampling

M x NH k l t i

Uniform sampling

L = M + N + 1M < N

Hankel matrix

M

NN

● Hankel matrix: Same terms on antidiagonals

Cadzow procedure

The idea is to decompose H‣ using Singular Value Decompostion SVDusing Singular Value Decompostion SVD

‣ i l l‣ singular values

we keep only the k largest singular values‣ and reconstruct a denoised signal from the rank-reduced H matrixand reconstruct a denoised signal from the rank reduced H matrix

‣ projection of H on a subspace ‣ then averaging on H antidiagonals

Cadzow, J.A. (1988) IEEE Trans. Acous. Speech Signal, Proc., 36, 49-62.

urQRd algorithm

Build H : MxN

Build a random matrix K is ~ number of signals K is ~ number of signals K << M < N

Sample H with itSample H with it Y smaller than H

Find main axes of YFind main axes of Y QR decomposition MUCH faster than SVD

k k d ti f H i Qmake a rank reduction of H using Q

Reconstruction, as with Cadzow

urQRd an efficient denoising of Fourier Transform (ICR, Orbitrap) Mass Spectrometry data

Page 14

Chiron, L., van Agthoven, M. A., Kieffer, B., Rolando, C., & Delsuc, M. A. (2014). Efficient denoising algorithmsfor large experimental datasets and their applications in Fourier transform ion cyclotron resonance massspectrometry. Proceedings of the National Academy of Sciences, 111(4), 1385-1390.

2 D FT-ICR: substance P, ESI, ECD, classical 2D FT spectrum

FT-ICR: 7 Tesla Solarix7 Tesla, SolarixHarmonized cell

I i ti ESI

Scintillation noise

Ionisation: ESI

Fragmentation: ECD

Analyte: Substance P, 1 picomol.microL-1

Fragment linep

Acquisition: F1, 2 k 45 minutesF2 128 kF2, 128 k4 Gbyte file

Data treatment (open

Page 15

Data treatment (open source): Spike

2 D FT-ICR: substance P, ESI, ECD, UrQRd denoised

Processing time proportional to fileproportional to file size.

Not limited byNot limited by computer memory.

Chiron L van Agthoven MChiron, L., van Agthoven, M. A., Kieffer, B., Rolando, C., & Delsuc, M. A. (2014). Efficient denoisingalgorithms for large experimental datasets and their applications in Fourier transform ion cyclotrontransform ion cyclotron resonance mass spectrometry. Proceedings of the National Academy of

Page 16

Sciences, 111, 1385-1390.

How to increase resolution in the first dimension

Do a better use of the first data points: Filter diagonalization method algorithm (FDM)Filter diagonalization method algorithm (FDM)

Hu, H., Van, Q. N., Mandelshtam, V. A., & Shaka, A. J. (1998). Reference deconvolution, phase correction, and line listing ofNMR spectra by the 1D filter diagonalization method. Journal of Magnetic resonance, 134(1), 76-87.Aizikov, K., & O’Connor, P. B. (2006). Use of the filter diagonalization method in the study of space charge related frequency

d l ti i F i t f i l t t t J l f th A i S i t f M

Page 17

modulation in Fourier transform ion cyclotron resonance mass spectrometry. Journal of the American Society for MassSpectrometry, 17(6), 836-843.Kozhinov, A. N., & Tsybin, Y. O. (2012). Filter diagonalization method-based mass spectrometry for molecular andmacromolecular structure analysis. Analytical chemistry, 84(6), 2850-2856.

How to increase resolution in the first dimension

NMR solution: skip points! Non uniform sampling (NUS) &Maximum entropy algorithm (MaxEnt) for reconstructionMaximum entropy algorithm (MaxEnt) for reconstruction

FFT MaxEntFirst 128 pts

FFT 1024 points

FFT Fi t 512 t

MaxEntRandomFirst 512 pts Random 128 out of 512 pts

Barna, J. C. J., Laue, E. D., Mayger, M. R., Skilling, J., & Worrall, S. J. P. (1987). Exponential sampling, an alternative method forsampling in two-dimensional NMR experiments. Journal of Magnetic Resonance (1969), 73(1), 69-77.D l M A (1989) A i t i l ith ith li ti t l ti

Page 18

Delsuc, M. A. (1989). A new maximum entropy processing algorithm, with applications to nuclear magnetic resonanceexperiments. In Maximum entropy and bayesian methods (pp. 285-290).Hyberts, S. G., Arthanari, H., Robson, S. A., & Wagner, G. (2014). Perspectives in magnetic resonance: NMR in the post-FFTera. Journal of Magnetic Resonance, 241, 60-73.

RECITAL (derived from FISTA) on 1 D

Myoglobine MS, res 15000, FFTFFT,no isotopic resolution

FISTA 20 noise10

FISTA 20 noise1

Page 19Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems (FISTA). SIAM Journal on Imaging Sciences, 2(1), 183-202.

RECITAL (derived from FISTA) on 1 D MS

Cytochrome C, m/z 773(+15), res 7500, MS/MS, ETD 50 ms, m/z 200-2000

FFTMS-Align YADA-deconvoluted

Completely wrong identification at 10, 15 and 20 ppm error

MS-Align on YADA-deconvolutedFISTA 10

Page 208 ETD fragments at 20 ppm. Identification of the protein with a high probability

NUS & RECITAL: zoom on parent ion (doubly charged)

NUS 4 k div 2Non NUS 2 k NUS 8 k div 4NUS 4 k div 2 Non NUS 2 k NUS 8 k div 4

NUS 16 k div 8 NUS 32 k div 16NUS 16 k div 8 NUS 32 k div 16 Precursor resolutionis increasing withundersampling ratioundersampling ratioas expected.

Page 21

NUS & RECITAL: parent precursor profile (monoisotopic peak)

NUS 4 k div 2 Non NUS 2 k NUS 8 k div 4

NUS 16 k div 8 NUS 32 k div 16NUS 16 k div 8 NUS 32 k div 16

Page 22Precursor FWHM is decreasing proportionally to the undersampling ratio.

2 D FT-ICR new processing: urQRd for NUS

MH NH +/ MH NH +MH22+‐H2O

S b t P

MH+/ MH2.+

MH2‐NH3.+ / MH2‐NH3

+

MH2‐2NH3.+

+

C5. /C5

Substance PNUS 16 k div 8Fragment ion spectrum MH2‐CH3NO+

C10

Z9•/ Z9

Fragment ion spectrum

C4• / C4

b10‐NH32+

C9C8

C7

C6b

a2+

b2

Page 23

2 D FT-ICR new processing: urQRd for NUS

Substance P, NUS 16 k div 8

Parent ion spectrum, zoom Fragment ion spectrum zoom

Page 24

Parent ion spectrum, zoom Fragment ion spectrum, zoom

ANR Defi de tous les savoirs 2014, One Shot 2D FT-ICR

Block-coordinate algorithms (-> 2015)

P. L. Combettes and J.-C. Pesquet,P. L. Combettes and J. C. Pesquet, Stochastic quasi-Fejér block-coordinate fixed point iterations withrandom sweepingrandom sweepingSIAM Journal on Optimization, en révision

Cette nouvelle génération d'algorithmepermet un niveau supplémentairep ppde décomposition des variables: àchaque itération on ne traite quecertaines coordonnées descertaines coordonnées desvariables au lieu de toutes dans lesméthodes classiques. Ceci permet detraiter des problèmes de très

Page 25

traiter des problèmes de trèsgrande taille efficacement.

Proximal methods: tools for solving inverseproblems on a large scale (-> 2015)

Combettes, P. L., & Pesquet, J. C. (2011). Proximal splitting methods in signal processing. In Fixed-point algorithms for inverse problems in science and engineering (pp 185 212) Springerand engineering (pp. 185-212). Springer

Page 26

Proximal methods: tools for solving inverseproblems on a large scale. Application to DOSY

Page 27

Use of GPU to accelerate urQRd (-> 2015)

Carte NVIDIA Titan BlackCarte NVIDIA Titan Black

336 GB/s bande passante mémoire

5 Tflops single precisionp g p

1,7 TFlops single precision

250 W - 1000 €

2880 cœurs à 889 MHz, 6 GB

Page 28

IR and THz synchrotron radiation for high resolution spectroscopy

methyl formatePropane

methyl formate

THzPure Rotations Torsions / Rotations

2 4 6 8 1210

Vibrations / Rotations

Page 29Huge N (>109) & big k (proportional to N)

Optimisation et processus dynamiques en apprentissage et dans les problèmes inverses (8-12 septembre 2014)

Le but de ce colloque est de stimuler les discussions et de favoriser lacréation de nouvelles collaborations entre chercheurs italiens et français surcréation de nouvelles collaborations entre chercheurs italiens et français surles thèmes suivants: algorithmes pour l'optimisation convexe et lesinclusions monotones, méthodes de point fixe, théorie des jeux, interactions

t d i di èt t ti thé i d l' tientre dynamiques discrètes et continues, théorie de l'apprentissagestatistique, traitement de masses de données, problèmes inverses.

https://www ljll math upmc fr/ plc/sestri/

Page 30Organisateur: Patrick Combette

https://www.ljll.math.upmc.fr/~plc/sestri/

Complexity in Chemistry & Biology

http://chemcomplex2015.sciencesconf.org/

Organizer: Marc-André Delsuc & Bruno Kieffer

Opening Conference:Jean-Marie Lehn, Nobel prize of chemistry

Page 31

110 participants from the different communities (data treatment, bioinformatics, chemistry, biology)

Complex Systems Digital Campus

Page 32

Complex Systems Digital Campus

Page 33Co-organizer: Pierre ColletP

Creation of a start-up CASC4DE

Page 34

Company founders: Marc-André Delsuc, Bruno Krieff, Julia Chamot-Rooke, Christian Rolando and private partners

Mesure HD: joint papers 2014

1 - Chiron, L., van Agthoven, M. A., Kieffer, B., Rolando, C. & Delsuc,, , g , , , , , ,M.-A. Efficient denoising algorithms for large experimental datasetsand their applications in Fourier transform ion cyclotron resonance

t t P N tl A d S i USA 111 1385 1390 (2014)mass spectrometry. Proc Natl Acad Sci USA 111, 1385–1390 (2014).

2 - P. L. Combettes and J.-C. Pesquet, "Stochastic quasi-Fejér block-2 P. L. Combettes and J. C. Pesquet, Stochastic quasi Fejér blockcoordinate fixed point iterations with random sweeping," soumis àSIAM Journal on Optimization, en révision.

3 - F., Wagner, D., & Collet, P. (2014, January). Massively ParallelGenerational GA on GPGPU Applied to Power Load ProfilesGenerational GA on GPGPU Applied to Power Load ProfilesDetermination. In Artificial Evolution (pp. 227-239). SpringerInternational Publishing.

Page 35

http://www.springer.com/computer/ai/book/978-3-642-37958-1.

Mesure HD: organized or co-organized congress 2014

1 - Optimisation et processus dynamiques en apprentissage et dans les problèmes inverses 8-12 septembre, 2014, Sestri Levante, Italiales problèmes inverses 8 12 septembre, 2014, Sestri Levante, Italia(https://www.ljll.math.upmc.fr/~plc/sestri/)Organisée P. L. CombettesPrésentations de P L Combettes Q Van Ngyen (doctorant dePrésentations de P. L. Combettes, Q. Van Ngyen (doctorant de Combettes), Jean-Christophe Pesquet et Emilie Chouzenoux.

2 Chemical Complexity & Biology 19 20 janvier 2015 Strasbourg2 - Chemical Complexity & Biology 19-20 janvier 2015, Strasbourg(http://chemcomplex2015.sciencesconf.org/)Organisée par M-A Delsuc et B. Kieffer, avec le soutient de l’action M t dMastodonsPrésentations de Christian Rolando, Emilie Chouzenoux

3 - Complex Systems Digital Campus 2015 (CS-DC 2015), First World E-Conference, 30 septembre -1 octobre 2015.Co-organisateur Pierre Collet

Page 36

g

Actors of the MesureHD consortium

P Combette Préexistant

JC PesquetE Chouzenoux

2014

2015

P Collet

MA Delsuc P Roy

Page 37J. Chamot-RookeC Rolando

EUROANALYSISEUROANALYSIS XVIIIXVIII

dBordeaux6th to 10th September 20156th to 10th, September, 2015