creation and annotation of a recurrent spectral library of ... creation and annotation of a...

24
Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu , Ramesh Marupaka, Nirina R. Andriamaharavo, Yamil Simón-Manso, Yuxue Liang, Yuri A. Mirokhin, Xinjian Yan, Tallat H. Bukhari and Stephen E. Stein 67th ASMS Conference on Mass Spectrometry and Allied Topics June 5 th , 2019

Upload: others

Post on 01-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media

Components

Kelly H. Telu, Ramesh Marupaka, Nirina R. Andriamaharavo, Yamil Simón-Manso, Yuxue Liang, Yuri A. Mirokhin, Xinjian Yan, Tallat H. Bukhari and Stephen E. Stein

67th ASMS Conference on Mass Spectrometry and Allied Topics

June 5th, 2019

Page 2: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Project Goal and Talk Outline

• Goal: Identify as many metabolites as possible

• Outline:

1. Introduction (Intro)

2. Recurrent spectral library for CHO cell metabolites and media (Library)

3. Improving identification accuracy (ID)

4. Strategy to annotate unidentified spectra (Annotate)

5. Strategy to assign confidence to MS/MS IDs (Confidence)

Page 3: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Biomanufacturing

• We analyzed Chinese hamster ovary (CHO) cell metabolites and growth media components using LC-MS/MS as part of NIST’s Biomanufacturing Initiative.

• CHO cells are the predominant host cells for monoclonal antibody (mAb) production.1

• mAbs are a class of biopharmaceutical estimated to be worth nearly $125 billion in global sales by 2020.2

• Metabolic profiles of CHO cells and culture media can be used to monitor process variability, improve biologic output and identify contamination.

1. Walsh, G. (2018). Biopharmaceutical benchmarks 2018. Nature biotechnology, 36(12), 1137.2. Ecker, D. M., Jones, S. D., & Levine, H. L. (2015, January). The therapeutic monoclonal antibody market. In MAbs (Vol. 7, No. 1, pp. 9-14).

3. Image: https://commons.wikimedia.org/wiki/File:Antibody_IgG2.png

Intro 1/6

Page 4: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Metabolite Identification using MS2.48E7

100 200 300 400 500 600 700 800 900 1000 1100 1200

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lati

ve

Ab

un

da

nc

e

664.1169

Mass can be searched in metabolite

databases

High resolution and high mass accuracy

data help reduce the number of possible

identifications

Intro 2/6

Page 5: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Metabolite Identification using MS/MS4.28E6

50 100 150 200 250 300 350 400 450 500 550 600 650

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lati

ve

Ab

un

da

nc

e

428.0370

524.0582

136.0619

542.0687232.0831

444.0916348.070697.0285 664.1169250.0936 410.0270

Spectrum can be searched against a library to identify

metabolite

Intro 3/6

Page 6: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

NIST 17 Tandem Mass Spectral Library

15,243 compounds

123,881 precursor ions

574,826 spectra

Metabolites(human and plant)

60%

BioactivePeptides 10%

Drugs20%

Lipids

SugarsGlycans

PesticidesForensics etc.

SurfactantsContaminants

3,500 human metabolites

400 dipeptides, 800 tryptic tripeptides

80% (+), 20% (-)

27% MS2 of in-source

17% MSn

Intro 4/6

https://chemdata.nist.gov/

Characterizing Product Ions in a Reference Tandem Mass Spectral Library – WP 422

Page 7: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Library Searching

Spectrum List

Distribution of ScoresTop Match

Consensus (Library) Spectrum

Measured Spectrum

Hit List

Spectral Matching

Intro 5/6

Evaluation of NIST Library Search Software – WP 311

Page 8: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Definitions

• Consensus Spectra – Spectra that are a result of processing similar experimental spectra to produce a representative spectrum to put in a library.

• Recurrent Spectra – Spectra that occur repeatedly in the sample.

• Hybrid Search – A type of MS/MS search that finds ions that differ by an inert chemical group, hence can often match unidentified spectra with members of the same chemical classes that are present in the library. The term Delta Mass is used to represent the difference in mass between the query spectrum and library entry.

– Burke, M.C., et al., J. Proteome Res. 2017, 16(5), 1924-1935

– Blazenovic, I., et al., Analytical chemistry 2019, 91(3), 2155-2162

Intro 6/6

Page 9: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Ion Plot of CHO Metabolite Extraction Sample

• Each circle is a cluster of similar MS/MS spectra

• 94% of these clusters are not identified– Direct MS/MS Search

– Score cutoff of 400

Library 1/5

0 5 10 15 20 25 30

6

7

8

9

10

11

Identified

No ID

log

10(A

bu

nd

an

ce)

RT/min

Searched against NIST 17 MS/MS Library

Page 10: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Recurrent Unidentified Spectral Libraries

• Uses:

1. Identify any ion by fragmentation fingerprint

2. Has ion been seen before?

• How often, and under what conditions?

3. Assign class ID for compounds not in library

• Or not commercially available

4. Enables library ‘evolution’ via user feedback

• Identify once = Identify always

• Developed libraries derived from:

– CHO cells and culture media

– 7 Urine SRMs

– 8 Serum/Plasma SRMs

– Glycans

Library Creation

Identified Spectra

Unidentified Spectra

Preparation of metabolites and media

Searching against NIST library

LC-MS/MS analysis

Generation of consensus spectra

Clustering and annotation of spectra

Annotation of libraries

https://chemdata.nist.gov/

Library 2/5

Page 11: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Materials and Methods

• Metabolite Extraction Methods

– Resuspension in 50% Acetonitrile; Methanol/Chloroform/Water; Freeze/Thaw in Methanol

• Fresh/Spent Media Preparation

– Precipitation in 80% methanol, Drying completely, Resuspension in 100% methanol or 50% acetonitrile

• LC Methods

– Reversed phase and HILIC LC methods – Aqueous phase of metabolite extractions and media samples

– Lipid LC method – Organic phase of metabolite extraction

• MS Methods

– Positive and Negative mode

– Beam-type Collison Cell (BTCC) with wide energy range and ion trap (IT) fragmentation

Library 3/5

Page 12: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Searched Against Recurrent Unidentified Spectral (RUS) Library

• Now 49% of clusters are not identified.

• RUS Library– Represent all detectable

metabolites, both known and unknown.

– Cover all known fragmentation conditions and precursors.

– Combines CHO metabolites and media components because it is difficult to completely separate. Spectra origin is labeled (CHO, media, both).

• Differences from Tandem Library:– Contains unidentified spectra– Metabolite extracts vs single

compound analyzed

Library 4/5

0 5 10 15 20 25 30

6

7

8

9

10

11

NIST 17 ID

Recurrent

No ID

log

10(A

bu

nd

an

ce)

RT/min

Searched against Recurrent Library

Page 13: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

5 10 15 20 25

8

9

10

Recurrent

No ID

Manual ID

log

10(A

bu

nd

an

ce)

RT/min

High Charge (4+ or Higher) Clusters

• Present only in the 50% acetonitrile extractions

• Unexpected

– 79% are 1+

– 9% are 2+

– 6% are 3+

• Suspected large peptides/small proteins

• Red ones manually identified

α Thymosin

β Thymosins

Ubiquitin

Library 5/5

Page 14: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Improving MS/MS Identification Scoring

• Initially score was only based on quality of spectral match.

• Led to 3 types of match problems

1. In-source fragmentation match with a higher score compared to the non in-source match

2. Uncommon loss match with a higher score than a common neutral loss match

3. Hybrid search match with a higher score than direct match

• Solution – Changed scoring to give greater priority to some identifications with higher probability of being correct:

1. [M+H]+ precursor matches

2. [M+Na]+ and [M+H-NH3]+ precursor matches

3. [M+H-H2O]+ and [M]+ precursor matches

4. Hybrid search matches

5. Other in-source precursor matches

ID 1/4

Page 15: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

In-source Fragmentation Match with Higher Score

Score 964N-(α-Linolenoyl)tyrosine [M+H-C18H28O]+

Score 962N-Acetyl-L-tyrosine [M+H-C2H2O]+

Score 958L-Tyrosine [M+H]+

3.89E7

50 60 70 80 90 100 110 120 130 140 150 160 170 180 190

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lati

ve

Ab

un

da

nc

e136.0759

165.0549

123.0442

147.0443

119.0493

182.0815

ID 2/4

Page 16: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Uncommon Loss Match with Higher Score

Score 984N-(α-Linolenoyl)tyrosine [M+H-C18H31ON]+

Score 982L-Tyrosine [M+H-NH3]+

6.84E7

50 60 70 80 90 100 110 120 130 140 150 160 170

m/z

0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

Re

lati

ve

Ab

un

da

nc

e

123.0442

119.0493

147.0443

95.0493

165.0549

91.0544 137.0599109.0650

103.0544

ID 3/4

Priorities just put hits with higher

probability on top.

Still need an analyst to determine if the

top hit is the correct one.

Page 17: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Identification Reassignment

• Incorrect match types that improving scoring did not resolve:

1. Uncommon metabolite/not a metabolite match with a higher score compared to a common one

2. Too few peaks in the spectra to result in a match above the score cutoff of 400

• Incorrect match type introduced by improved scoring:

– protonated uncommon metabolite match with a higher score than a common metabolite with a common neutral loss

• Ex: protonated pyroglutamic acid instead of protonated glutamic acid with a water loss

• Solution:

– semi-automated approach that would re-assign an identification based upon a manually curated list of masses and their appropriate identification

• IDs discovered by hybrid search also included

ID 4/4

Page 18: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Label as Contaminated

Label as Background

Annotation of Recurrent Unidentified Spectra

Remove if Peak Width >30s

Remove if MaxPrecRelProd >10

Remove if Spectral Purity <80

Yes

Determine if Peak is in a Blank

No

Label as Insufficient

Fragmentation

Narrow Peak Width Broad Peak Width Label as BackgroundLabel as Artifact/Carryover

Annotation 1/2

Label as Unknown

Page 19: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Examples of Annotations- EICs

0 5 10 15 20 25 30

0

20

40

60

80

100

Rela

tive

Ab

un

da

nc

e

Time (min)

4.22E8

0 5 10 15 20 25 30

0

20

40

60

80

100 8.83E7

Rela

tive

Ab

un

da

nc

e

Time (min)

0 5 10 15 20 25 30

0

20

40

60

80

1003.61E8

Rela

tive A

bu

nd

an

ce

Time (min)

0 5 10 15 20 25 30

0

20

40

60

80

1005.67E7

Rela

tive

Ab

un

da

nc

e

Time (min)

SampleBlank

Identified – Tyrosine [M+H]+

EIC of 182.0815 UnknownEIC of 302.1965

Artifact/CarryoverEIC of 757.544

BackgroundEIC of 111.0203

Annotation 2/2

Page 20: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Assigning Confidence – Med Initial Confidence

• Developed a qualitative confidence level to provide further information about probability of the MS/MS match being correct

• Complementary to confidence levels in proposed in literature

NoYes

Is the spectrum annotated as artifact/carryover?

High Low

Is the spectrum annotated as unknown?

Is the identification a known metabolite?

Score 600-799Med

Confidence Levels in IDs- Schymanski, et al. Environmental science & technology 2014, 48 (4), 2097-8

Confidence 1/3

Is the spectrum annotated as unknown?

NoYes

Med Low

NoYes

High High

NoYes

High Med

Page 21: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

NoYes

Workflow for High Initial ConfidenceConfidence 2/3

NoYes

Is the spectrum annotated as artifact/carryover?

High Med

Is the spectrum annotated as unknown?

Is the identification a known metabolite?

Score 800-999High

Is the spectrum annotated as unknown?

NoYes

Med LowHigh High

NoYes

High Med

Page 22: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

NoYes

Workflow for Low Initial ConfidenceConfidence 3/3

NoYes

Is the spectrum annotated as artifact/carryover?

Med Low

Is the spectrum annotated as unknown?

Is the identification a known metabolite?

Score 400-599Low

Is the spectrum annotated as unknown?

NoYes

Med LowMed Med

NoYes

Med Low

Page 23: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

Summary• Created a recurrent spectral library for CHO cell metabolites and media

• Improved MS/MS identification accuracy

• Developed a strategy to annotate unidentified spectra

• Developed strategy to assign confidence to MS/MS IDs

• Next Steps:

– Make library available at https://chemdata.nist.gov/

– Automate the annotation and assignment of confidence

– Apply library to analysis of samples

• Manuscript is being prepared

Page 24: Creation and Annotation of a Recurrent Spectral Library of ... Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media Components Kelly H. Telu, Ramesh

CHO Cell Growth:

Biomolecular Labeling Lab (BL2) at the Institute for Bioscience and Biotechnology Research (IBBR)

– Zvi Kelman

– Renae Preston

– Lila Kashi

Thanks for your Attention

Have questions or want to talk?

BOOTH 616 – Today from 2:30-5:00 pm

Tytus Mak – WOA 3:50 pm, here