creation and annotation of a recurrent spectral library of ... creation and annotation of a...
Post on 01-Oct-2020
0 Views
Preview:
TRANSCRIPT
Creation and Annotation of a Recurrent Spectral Library of CHO Cell Metabolites and Media
Components
Kelly H. Telu, Ramesh Marupaka, Nirina R. Andriamaharavo, Yamil Simón-Manso, Yuxue Liang, Yuri A. Mirokhin, Xinjian Yan, Tallat H. Bukhari and Stephen E. Stein
67th ASMS Conference on Mass Spectrometry and Allied Topics
June 5th, 2019
Project Goal and Talk Outline
• Goal: Identify as many metabolites as possible
• Outline:
1. Introduction (Intro)
2. Recurrent spectral library for CHO cell metabolites and media (Library)
3. Improving identification accuracy (ID)
4. Strategy to annotate unidentified spectra (Annotate)
5. Strategy to assign confidence to MS/MS IDs (Confidence)
Biomanufacturing
• We analyzed Chinese hamster ovary (CHO) cell metabolites and growth media components using LC-MS/MS as part of NIST’s Biomanufacturing Initiative.
• CHO cells are the predominant host cells for monoclonal antibody (mAb) production.1
• mAbs are a class of biopharmaceutical estimated to be worth nearly $125 billion in global sales by 2020.2
• Metabolic profiles of CHO cells and culture media can be used to monitor process variability, improve biologic output and identify contamination.
1. Walsh, G. (2018). Biopharmaceutical benchmarks 2018. Nature biotechnology, 36(12), 1137.2. Ecker, D. M., Jones, S. D., & Levine, H. L. (2015, January). The therapeutic monoclonal antibody market. In MAbs (Vol. 7, No. 1, pp. 9-14).
3. Image: https://commons.wikimedia.org/wiki/File:Antibody_IgG2.png
Intro 1/6
Metabolite Identification using MS2.48E7
100 200 300 400 500 600 700 800 900 1000 1100 1200
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lati
ve
Ab
un
da
nc
e
664.1169
Mass can be searched in metabolite
databases
High resolution and high mass accuracy
data help reduce the number of possible
identifications
Intro 2/6
Metabolite Identification using MS/MS4.28E6
50 100 150 200 250 300 350 400 450 500 550 600 650
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lati
ve
Ab
un
da
nc
e
428.0370
524.0582
136.0619
542.0687232.0831
444.0916348.070697.0285 664.1169250.0936 410.0270
Spectrum can be searched against a library to identify
metabolite
Intro 3/6
NIST 17 Tandem Mass Spectral Library
15,243 compounds
123,881 precursor ions
574,826 spectra
Metabolites(human and plant)
60%
BioactivePeptides 10%
Drugs20%
Lipids
SugarsGlycans
PesticidesForensics etc.
SurfactantsContaminants
3,500 human metabolites
400 dipeptides, 800 tryptic tripeptides
80% (+), 20% (-)
27% MS2 of in-source
17% MSn
Intro 4/6
https://chemdata.nist.gov/
Characterizing Product Ions in a Reference Tandem Mass Spectral Library – WP 422
Library Searching
Spectrum List
Distribution of ScoresTop Match
Consensus (Library) Spectrum
Measured Spectrum
Hit List
Spectral Matching
Intro 5/6
Evaluation of NIST Library Search Software – WP 311
Definitions
• Consensus Spectra – Spectra that are a result of processing similar experimental spectra to produce a representative spectrum to put in a library.
• Recurrent Spectra – Spectra that occur repeatedly in the sample.
• Hybrid Search – A type of MS/MS search that finds ions that differ by an inert chemical group, hence can often match unidentified spectra with members of the same chemical classes that are present in the library. The term Delta Mass is used to represent the difference in mass between the query spectrum and library entry.
– Burke, M.C., et al., J. Proteome Res. 2017, 16(5), 1924-1935
– Blazenovic, I., et al., Analytical chemistry 2019, 91(3), 2155-2162
Intro 6/6
Ion Plot of CHO Metabolite Extraction Sample
• Each circle is a cluster of similar MS/MS spectra
• 94% of these clusters are not identified– Direct MS/MS Search
– Score cutoff of 400
Library 1/5
0 5 10 15 20 25 30
6
7
8
9
10
11
Identified
No ID
log
10(A
bu
nd
an
ce)
RT/min
Searched against NIST 17 MS/MS Library
Recurrent Unidentified Spectral Libraries
• Uses:
1. Identify any ion by fragmentation fingerprint
2. Has ion been seen before?
• How often, and under what conditions?
3. Assign class ID for compounds not in library
• Or not commercially available
4. Enables library ‘evolution’ via user feedback
• Identify once = Identify always
• Developed libraries derived from:
– CHO cells and culture media
– 7 Urine SRMs
– 8 Serum/Plasma SRMs
– Glycans
Library Creation
Identified Spectra
Unidentified Spectra
Preparation of metabolites and media
Searching against NIST library
LC-MS/MS analysis
Generation of consensus spectra
Clustering and annotation of spectra
Annotation of libraries
https://chemdata.nist.gov/
Library 2/5
Materials and Methods
• Metabolite Extraction Methods
– Resuspension in 50% Acetonitrile; Methanol/Chloroform/Water; Freeze/Thaw in Methanol
• Fresh/Spent Media Preparation
– Precipitation in 80% methanol, Drying completely, Resuspension in 100% methanol or 50% acetonitrile
• LC Methods
– Reversed phase and HILIC LC methods – Aqueous phase of metabolite extractions and media samples
– Lipid LC method – Organic phase of metabolite extraction
• MS Methods
– Positive and Negative mode
– Beam-type Collison Cell (BTCC) with wide energy range and ion trap (IT) fragmentation
Library 3/5
Searched Against Recurrent Unidentified Spectral (RUS) Library
• Now 49% of clusters are not identified.
• RUS Library– Represent all detectable
metabolites, both known and unknown.
– Cover all known fragmentation conditions and precursors.
– Combines CHO metabolites and media components because it is difficult to completely separate. Spectra origin is labeled (CHO, media, both).
• Differences from Tandem Library:– Contains unidentified spectra– Metabolite extracts vs single
compound analyzed
Library 4/5
0 5 10 15 20 25 30
6
7
8
9
10
11
NIST 17 ID
Recurrent
No ID
log
10(A
bu
nd
an
ce)
RT/min
Searched against Recurrent Library
5 10 15 20 25
8
9
10
Recurrent
No ID
Manual ID
log
10(A
bu
nd
an
ce)
RT/min
High Charge (4+ or Higher) Clusters
• Present only in the 50% acetonitrile extractions
• Unexpected
– 79% are 1+
– 9% are 2+
– 6% are 3+
• Suspected large peptides/small proteins
• Red ones manually identified
α Thymosin
β Thymosins
Ubiquitin
Library 5/5
Improving MS/MS Identification Scoring
• Initially score was only based on quality of spectral match.
• Led to 3 types of match problems
1. In-source fragmentation match with a higher score compared to the non in-source match
2. Uncommon loss match with a higher score than a common neutral loss match
3. Hybrid search match with a higher score than direct match
• Solution – Changed scoring to give greater priority to some identifications with higher probability of being correct:
1. [M+H]+ precursor matches
2. [M+Na]+ and [M+H-NH3]+ precursor matches
3. [M+H-H2O]+ and [M]+ precursor matches
4. Hybrid search matches
5. Other in-source precursor matches
ID 1/4
In-source Fragmentation Match with Higher Score
Score 964N-(α-Linolenoyl)tyrosine [M+H-C18H28O]+
Score 962N-Acetyl-L-tyrosine [M+H-C2H2O]+
Score 958L-Tyrosine [M+H]+
3.89E7
50 60 70 80 90 100 110 120 130 140 150 160 170 180 190
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lati
ve
Ab
un
da
nc
e136.0759
165.0549
123.0442
147.0443
119.0493
182.0815
ID 2/4
Uncommon Loss Match with Higher Score
Score 984N-(α-Linolenoyl)tyrosine [M+H-C18H31ON]+
Score 982L-Tyrosine [M+H-NH3]+
6.84E7
50 60 70 80 90 100 110 120 130 140 150 160 170
m/z
0
5
10
15
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
95
100
Re
lati
ve
Ab
un
da
nc
e
123.0442
119.0493
147.0443
95.0493
165.0549
91.0544 137.0599109.0650
103.0544
ID 3/4
Priorities just put hits with higher
probability on top.
Still need an analyst to determine if the
top hit is the correct one.
Identification Reassignment
• Incorrect match types that improving scoring did not resolve:
1. Uncommon metabolite/not a metabolite match with a higher score compared to a common one
2. Too few peaks in the spectra to result in a match above the score cutoff of 400
• Incorrect match type introduced by improved scoring:
– protonated uncommon metabolite match with a higher score than a common metabolite with a common neutral loss
• Ex: protonated pyroglutamic acid instead of protonated glutamic acid with a water loss
• Solution:
– semi-automated approach that would re-assign an identification based upon a manually curated list of masses and their appropriate identification
• IDs discovered by hybrid search also included
ID 4/4
Label as Contaminated
Label as Background
Annotation of Recurrent Unidentified Spectra
Remove if Peak Width >30s
Remove if MaxPrecRelProd >10
Remove if Spectral Purity <80
Yes
Determine if Peak is in a Blank
No
Label as Insufficient
Fragmentation
Narrow Peak Width Broad Peak Width Label as BackgroundLabel as Artifact/Carryover
Annotation 1/2
Label as Unknown
Examples of Annotations- EICs
0 5 10 15 20 25 30
0
20
40
60
80
100
Rela
tive
Ab
un
da
nc
e
Time (min)
4.22E8
0 5 10 15 20 25 30
0
20
40
60
80
100 8.83E7
Rela
tive
Ab
un
da
nc
e
Time (min)
0 5 10 15 20 25 30
0
20
40
60
80
1003.61E8
Rela
tive A
bu
nd
an
ce
Time (min)
0 5 10 15 20 25 30
0
20
40
60
80
1005.67E7
Rela
tive
Ab
un
da
nc
e
Time (min)
SampleBlank
Identified – Tyrosine [M+H]+
EIC of 182.0815 UnknownEIC of 302.1965
Artifact/CarryoverEIC of 757.544
BackgroundEIC of 111.0203
Annotation 2/2
Assigning Confidence – Med Initial Confidence
• Developed a qualitative confidence level to provide further information about probability of the MS/MS match being correct
• Complementary to confidence levels in proposed in literature
NoYes
Is the spectrum annotated as artifact/carryover?
High Low
Is the spectrum annotated as unknown?
Is the identification a known metabolite?
Score 600-799Med
Confidence Levels in IDs- Schymanski, et al. Environmental science & technology 2014, 48 (4), 2097-8
Confidence 1/3
Is the spectrum annotated as unknown?
NoYes
Med Low
NoYes
High High
NoYes
High Med
NoYes
Workflow for High Initial ConfidenceConfidence 2/3
NoYes
Is the spectrum annotated as artifact/carryover?
High Med
Is the spectrum annotated as unknown?
Is the identification a known metabolite?
Score 800-999High
Is the spectrum annotated as unknown?
NoYes
Med LowHigh High
NoYes
High Med
NoYes
Workflow for Low Initial ConfidenceConfidence 3/3
NoYes
Is the spectrum annotated as artifact/carryover?
Med Low
Is the spectrum annotated as unknown?
Is the identification a known metabolite?
Score 400-599Low
Is the spectrum annotated as unknown?
NoYes
Med LowMed Med
NoYes
Med Low
Summary• Created a recurrent spectral library for CHO cell metabolites and media
• Improved MS/MS identification accuracy
• Developed a strategy to annotate unidentified spectra
• Developed strategy to assign confidence to MS/MS IDs
• Next Steps:
– Make library available at https://chemdata.nist.gov/
– Automate the annotation and assignment of confidence
– Apply library to analysis of samples
• Manuscript is being prepared
CHO Cell Growth:
Biomolecular Labeling Lab (BL2) at the Institute for Bioscience and Biotechnology Research (IBBR)
– Zvi Kelman
– Renae Preston
– Lila Kashi
Thanks for your Attention
Have questions or want to talk?
BOOTH 616 – Today from 2:30-5:00 pm
Tytus Mak – WOA 3:50 pm, here
top related